Reflections on the 20th Anniversary of the First Publication of the Human Genome

A new wave of research is needed to make ample use of humanity’s “most wondrous map”

On June 26, 2000 in the East Room of the White House I stood at the podium and announced the sequencing of the human genome, a project accomplished by the relatively small team at my company in only nine months. Seated behind me was President Clinton, and on a giant screen was U.K. Prime Minister Tony Blair. Francis Collins was on stage as the head of the National Institutes of Health human genome team. Seated in front of me were some of the senior scientists associated with human genome sequencing as well as top government officials and ambassadors from around the world. Behind the guests were 50 or so TV cameras and photographers. The entire event was being broadcast live around the world.

After years of never-ending work, criticism (from the outside world and even internally at my company), intervention by top science journal editors and even President Clinton, to be standing where history was being made that day was a very emotional and fulfilling experience. It was hard to believe we made it to this point though, and there was drama leading up to and even into the early hours of the morning. We all had to share drafts of our speeches the day before the event, and when I saw the speech from Prime Minister Blair, I told the head of the Office of Science and Technology Policy that I would not attend unless his speech was changed. I thought it was one-sided and contained disparaging remarks about me and my team. The White House science adviser said that they could not change a foreign head of government’s speech. I said that if they wanted me to attend, they needed to do something. I received a call at 2 A.M. indicating that I would be very pleased with his speech, which I was indeed.

How did we get to this historic place? Genome sequencing discussions began in the mid 1980s that led to an NIH/Department of Energy genome effort that was getting funded with billions of dollars but was proceeding slowly by spreading the genome fragments over multiple labs around the world. My team at my first not-for-profit research institute, The Institute for Genomic Research or TIGR, was funded to do a small segment and assumed we would sit out the genome project.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

In 1995, we published the first genome of a free-living organism, H. influenzae, in Science. This genome was sequenced using our new algorithm and automation to sequence a genome as a single project in months rather than years. I was certain that this approach would work with the human genome, but I was only one of a few who believed this. In 1998, my world changed with a call from Applied Biosystems (ABI) and their parent company offering $300 million for me to set up a new company to sequence the human genome with my technique and their new machine. I flew out to their headquarters in Foster City, Calif., to look at the prototype version of their new machine and was convinced it would work. We calculated that we would need 300 machines. On returning to TIGR, I told Ham Smith, Nobel laureate and my friend and colleague, what I saw and said that I had to go do this. His reply was: “I don’t think it will work, but I am going with you.”

We started a new company called Celera Genomics with the goal of sequencing the first human genome in three years or less. The company’s tagline was “Speed Matters, Discovery Can’t Wait.” This announcement was not met with open arms by the NIH-led sequencing community who said Celera’s sequencing plans would end up with the “swiss cheese,” “CliffsNotes”, “Reader’s Digest” or even “Mad Magazine” version of the genome. I guess I can understand why they were not thrilled to have a newcomer to the game and thus began what the press dubbed a race to sequencing the human genome pitting Celera against the NIH and international genome effort.

We knew the algorithm that we were using for bacterial genomes would not work for humans, nor would any of the existing computers. We had thousands of resumes sent to us, and fortunately one was from Eugene Myers, who ended up being one of the key heroes of the human genome. Gene, who was then a faculty member at the University of Arizona and had been the key developer of the BLAST tool for sequencing analysis, had been thinking about larger genome assembly and was encouraged by our success with bacterial genomes. Gene and a small team wrote 500,000 lines of computer code in a few months to create the Celera Assembler. Nine months later we had a complete human genome sequence and set out to annotate it to see what it said about us. We published our analysis in Science only after the late Don Kennedy, Science editor, stepped in to override leaders of the public project who had attempted to block our publication. The NIH effort published their data in Nature on the same day.

So, with the 20th anniversary of the publication of the first sequencing of the human genome on February 16, 2001, what do we have to show for the past two decades? The first decade post publication made steady progress in sequencing technology enabling more and more genomes of every class of life being sequenced, but unfortunately little effort has gone into generating knowledge and understanding about the human genome. This is due in part to the fact that significant funding in the United States at the government level has dwindled, while in other countries funding has increased. The good news is that essentially every new drug and vaccine is now based on genomics, and basic research has changed from sequencing genes to more function-based research.

Many thought that just by sequencing large numbers of genomes that understanding and new knowledge would fall into place. While that has helped with ancestry tracing and genome variation, there is still so much for us to learn and understand about how the genome codes for us humans.

Five years ago, I formulated a new approach combining comprehensive phenotyping with deep genome analysis using machine learning/artificial intelligence algorithms and other tools. The new approach came about because my genome showed that I was a heterozygote for the APOE gene, which confers a substantially increased risk for Alzheimer’s Disease. I convinced some neurologists at University of California, San Diego, to do an MRI brain scan and an MRI/PET scan for amyloid, thought to be a key marker for the disease. The good news for me personally was that both tests came back negative, but it showed me that I needed to combine clinical phenotype tests like the MRI with the genome to understand the predictive risk. This led to the formation of a new company called Human Longevity, Inc (HLI).

The goal of HLI was to offer the most comprehensive set of clinical tests for self-described healthy individuals that we could do in one day, such as whole-body MRI, cardiac CT scans, bone density, 4-D echo cardiac test and remote cardiac monitoring. We included a large array of chemical tests including the complete metabolome screening. The results of these integrated tests on so-called healthy people have been truly stunning. About 40 to 50 percent of people tested had significant disease of which they were unaware. Approximately 5 percent over 50 had a major tumor. The good news is they were almost all at early stages and could be removed or treated with radiation. About 1 percent of all tested had a brain aneurysm. Machine learning is providing new genome loci that correlate with diseases discovered. We are also looking for protective genetic markers for those like me with APOE changes but no Alzheimer’s, or women with BRCA mutations but no breast or ovarian cancers.

This notion of testing seemingly healthy people is not without critics. Some argue that if you look you will find something, and we might not have a cure or treatment for that disease, thus creating unnecessary distress. Or they say some tumors might be so slow-growing that treating them leads to unnecessary side effects; a “wait and see” is thus a better approach. As I have tried to show with my career, I’m not satisfied with this. I believe that we have an obligation to utilize all the tools and knowledge we fought so hard to develop and uncover, including the one with the most potential, our human genome.

Overall, the practice of medicine needs to drastically change. We can prevent and predict diseases if we combine genomics on a grand scale with clinical phenotyping and machine learning. One factor impeding this progress is the fact that the health care system is incented to offer treatments but not prevention. With new clinical tools, cancers and other diseases can be detected at the earliest stages when treatments and potential cures are minimally invasive.

The genome will play a key role in the future bioeconomy, but the U.S. is already way behind. We are 54th in the world for sequence screening of new COVID-19 virus strains. And except for cancer, the genome is not a part of the practice of medicine. We all thought the genome sequence would allow us to understand ourselves and change medicine. That is happening too slowly, costing tens of millions of lives that could have been saved if we made it a national priority. One example is that it could be relatively easy to know who would be most susceptible to death from COVID-19 and flu. With the specter of more emerging infectious diseases, we need to act sooner rather than later.

When I stood at the podium at the White House press conference to announce the genome I said, “The method used by Celera has determined the genetic code of five individuals. We have sequenced the genome of three females and two males, who have identified themselves as Hispanic, Asian, Caucasian or African American. We did this sampling not in an exclusionary way, but out of respect for the diversity that is America, and to help illustrate that the concept of race has no genetic or scientific basis. In the five Celera genomes, there is no way to tell one ethnicity from another. Society and medicine treats us all as members of populations, where as individuals we are all unique, and population statistics do not apply.” I still stand by this statement. In fact, what we find today is that socioeconomic background contributes more to health access and outcomes than any other factor, biological or otherwise. The COVID-19 pandemic is a real-world, real-time example of this.

Progress is only made by daring to go where no roads currently exist. As President Clinton said at the White House event in 2000 to unveil the first survey of the human genome, “this is the most important, most wondrous map ever produced by humankind.” We need more explorers and more funding to fully utilize this map to uncover the new “lands” yet to be discovered in the human genome.