Large-Scale Whole-Genome Sequencing Efforts

By Steven Monroe Lipkin

Featured Article Author

At the age of twenty-seven, Ansel K. Paulson was a happy, healthy, and successful young man living and working in Manhattan, working at a financial company that managed mutual and hedge-fund back-office operations. Full of energy, as his hobby he was an amateur, nightclub, stand-up comic and also belonged to an improvisational comedy troupe performing in a Hells’ Kitchen nightclub. Ansel K. had a serious girlfriend and was considering settling down, getting married, and starting a family, at least in the next few years. As a healthy young guy in his twenties enjoying himself, Ansel K. did not go to the doctor very much. In fact, he did not have a primary care doctor following him and I was the first doctor he had seen in about six years.

Michael Mapes,Specimen No. 41 (detail) (2007) Photographic prints, various printed matter, entomology pins, poster board, test tubes, magnifying glasses, gelatin capsules, petri dish, hair, plastic bags, cotton thread, rubber bands, resins, modelling clay, 48 x 64 x 8 cm.

Michael Mapes,Specimen No. 41 (detail) (2007)
Photographic prints, various printed matter, entomology pins, poster board, test tubes, magnifying glasses, gelatin capsules, petri dish, hair, plastic bags, cotton thread, rubber bands, resins, modelling clay, 48 x 64 x 8 cm.

His first questions to me one rainy day during his first visit to the Weill-Cornell Medicine and New York Presbyterian Hospital Adult Genetics clinic in New York City were: “Am I going to need a $400,000-a-year medicine for the rest of my life, and am I going to lose my insurance?”

Ansel’s family was also generally healthy and fortunately for him he did not have a very notable family history of serious medical problems. However, several months earlier, Ansel’s mother, curious about her family’s genetic ancestry and someone who enjoyed reading Scientific American, the “Science Times” section of The New York Times, and the “Nova” television series about many of the exciting new developments in genetic medicine, had ordered through the mail a $199 DNA test on her saliva from the company 23andMe.

Of the three billion or so bases of DNA in a person’s genome, 23andMe analyzes a curated selection of about a million that have been associated with an array of information regarding a person’s ancestry and, up until the United States Food and Drug Administration limited their testing several years ago, genetic variants associated with a wide range of genetic diseases. 23andMe has tested DNA from more than a million people, and currently has contract collaborations with drug companies such as Roche, Genentech, and Pfizer to discover new pharmaceutical targets for new genetic diseases.

Ansel’s mother’s test results revealed that she carried a genetic mutation that had previously been identified in dozens of people that caused a well-studied and concerning malady for which there are several effective (although expensive) therapies: Gaucher disease. Gaucher disease is caused by mutations in the GBA gene. In Gaucher disease mutation carriers, there is a buildup of a particular type of fatty cell debris whose proper scientific name is glucosylceramide.

In Gaucher disease patients, the spleen can swell up to ten times its normal size. Depending on the particular patient and mutation, Gaucher disease can also cause enlargement of the liver, anemia, the blood cancer multiple myeloma, and the neurodegenerative condition Parkinson’s disease, among others. However, Ansel’s mother apparently had not read the textbooks. She was in her late fifties, but she had not had any clear medical problems linked to her disease, even though other Gaucher disease carriers with the same mutation clearly do and develop severely morbid medical problems. Gaucher disease is treated by drugs that replace the protein produced by the defective GBA gene; in the United States this can cost more than $400,000 each year. Yet, curiously, neither Ansel’s mother nor Ansel appeared to have any symptoms of Gaucher disease and its sequelae.

In part, this curious situation reflects a common scenario for most genetic diseases. When they are originally identified, researchers focus on studying the most severely affected patients in order to be sure that they have correctly defined the symptoms associated with a particular gene. Then, with time, as more people are tested for mutations in the gene, it turns out (most often) that some people can have the gene mutation but have less severe symptoms, or perhaps not even develop the disease symptoms at all. In general, for most genetic conditions, carrying a particular mutation statistically increases the probability risk of developing the disease, but does not mean a person is destined to suffer or die from that disease. Other factors are at play.

How can this be? In 1624, before we ever knew about DNA or genes, the English poet John Donne wrote: “No man is an island.” The same is often true for genes and gene mutations, and is why today we are entering the era of population genomics. There are a large number of other factors that can influence whether a gene mutation in fact causes a person to develop severe enough disease to bring them out of their blissful state of being healthy and into the medical system as a patient with a diagnosis. These factors are often called disease modifiers. Disease modifiers can be environmental exposures (for example, the amount of sugar in the diet in individuals predisposed to become diabetic, or the level of daily exercise that influences weight) or include variants in genes that modify the impact of other genes, called gene-gene interactions.

Thus, for example, it is possible that Ansel K.’s family carries other genetic mutations that ameliorate the symptoms of Gaucher disease in people with their particular GBA mutation, and which could in principle lead to new drug targets to treat the disease that are better than the existing (quite costly) medications. While we know there are many disease modifiers, the critical issue for a patient’s treating physician is whether, for a given disorder, there are any individual modifiers that have large-effect size and are potentially actionable.

The number of variables that can possibly modify genetic mutations is extremely large, in terms of genetic modifiers easily reaching into the millions. In combination with the rapid decline in the cost of DNA sequencing, the interest to determine which diseases have a genetic component, the strong incentive to identify and develop targets for drugs and also the genetic and environmental modifiers of disease have driven efforts in Europe, the United States, and Asia to sequence the whole genomes of hundreds to thousands to now millions of people, and this has taken several forms.

Clockwise, from top left: breast-cancer cells; HeLA cells; colon-cancer cells; and blood vessels in a melanoma.

Clockwise, from top left: breast-cancer cells; HeLA cells; colon-cancer cells; and blood vessels in a melanoma.

Many people ask why we need to do so much DNA sequencing. Don’t we already have a good understanding of genetics in human beings? One of the great surprises from the Human Genome Project, which completed the first human genome in 2002, was that there were only about 20,000 genes—about as many genes as are in mealworms, fruit flies, or mice. That is the basic blueprint for life. What is particularly notable are the genetic differences between different species of animals, and now the focus has honed in to the genetic differences between each individual. In order to understand the role of individual mutations, we now recognize that we must understand the context. This includes both the other genetic variants inherited with a mutation, and, just as importantly, in order to make sense of all the information, to know as much as possible about the details of the medical and seemingly nonmedical life data of the person who carries these mutations. The specific information needed includes details to annotate the genomes including what their symptoms are, at what age they first started, other information such as blood pressure, height and weight, previous illnesses, medications, vaccinations, and the like.

Finding the specific causes of genetic diseases and what modifies their impact on keeping a person well or ill has significant implications. Most importantly, for public health, this information allows us to prevent diseases, or at least to catch them early, so that we can reduce the large burden of human suffering from disease. Equally important, studying all these genomes and the associated information about the lives of these patients will reveal new targets for the pharmaceutical industry to develop drugs for disease prevention to keep human beings healthier longer into their senior years. This latter point has significant commercial, as well as scientific and medical, value, and in part has driven several national efforts to sequence the genomes of hundreds of thousands to millions of people. Thus, the race is on to identify gene modifiers and combinations of genetic mutations that cause disease, sequencing millions of people to find those combinations that will serve as the foundation for the medicine of the next century.

At present, the project farthest along is in the United Kingdom. In part inspired by the 2012 Olympic Games in London that highlighted to the then British Prime Minister David Cameron that great athletic feats could be accomplished by exceptional individuals endowed with the right genetic traits and a strong drive to excel and compete, the 100,000 genomes project was initiated. A company was formed, Genomics England.

Genomics England is wholly owned and funded by the UK Department of Health with the goal of sequencing 100,000 whole genomes from its National Health Service patients by the end of 2017. Its primary missions are to bring benefit to patients, set up a genomic medicine infrastructure for the National Health Service, enable new scientific discoveries and medical insights, and accelerate the development of the UK genomics industry.

Patients donate their samples, medical, and lifestyle information after signing informed consent that permits their DNA and information to be used. These are more substantial although similar to consents that are used before a patient can have surgery performed on their body, or chemotherapy given for cancer. Or, perhaps another example is the consent that individuals who use Facebook or the Apple iPhone operating system; this allows these companies to use the data a person collects in their accounts, and allows the company to profit off it by targeting advertising to an individual or selling aggregated customer data to third-party marketing companies. The great majority of individuals in general consider the benefits of using these services, often provided at little or no cost, to far outweigh the risks of losing control of one’s privacy.

These informed consents have been approved by an independent UK National Health Service ethics committee. The consent includes statements explicitly asking if patients are willing for commercial companies to be able to conduct approved research on their genomes and data. Many patients are, indeed, eager to see their genetic, medical, and lifestyle data used to help further research progress into the specific conditions that affect them, a consummate use of crowdsourcing.

Genomics England will also charge for its data services to ensure that the costs to UK taxpayers for maintaining the data are kept reasonably low. Potential users will include pharmaceutical companies, but may, in fact, also include academic researchers and physicians in the United States and Europe. For example, an individual in Germany with a particular genetic mutation, or their physician, may have to pay a fee to access data regarding the symptoms and morbidities of other people who carry the same “one in a million” genetic variant. At this point in time precise details remain to be determined.

Inspired by the same excitement as Genomics England, there are several other sizable initiatives to sequence large numbers of citizens. These include collaborations such as those between the American health maintenance organization Geisinger Health System in Pennsylvania with the Regeneron Pharmaceutical Corporation to sequence the exomes (the ~1.5% of the genome that encodes all the DNA made into proteins, and the part of the human genome that is thought to be the most important for disease) of 50,000–100,000 enrollees. Another project is the private start-up Human Longevity, Inc., which was founded by Craig Venter—who previously commercially sequenced a large portion of the original human genome in 2003—to sequence 100,000 or more patients from the University of California, San Diego Health System, health maintenance organizations in South Africa, and those participating in clinical trials run by Roche and Astra-Zeneca. In China, Germany, the Netherlands, Israel, Saudi Arabia, Estonia, and Qatar there are similar large-scale sequencing initiatives involving tens to hundreds of thousands of people.

The current leader, Genomics England, includes both adults and children, but is more focused on diseases of adults. In 2013, the United States National Institutes of Health initiated an exciting new $25-million pilot program to sequence the genomes of hundreds of babies in America, with the goals of learning more about how ethically sound, interpretable, and clinically actionable this testing could turn out to be in comparison with biochemical newborn screening tests, such as for preventable genetic conditions like galactosemia or PKU that can cause mental retardation if not caught early and treated. “One can imagine a day when every newborn will have their genome sequenced at birth, and it would become a part of the electronic health record that could be used throughout the rest of the child’s life both to think about better prevention but also to be more alert to early clinical manifestations of a disease,” said Alan Guttmacher, director of the National Institute of Child Health and Human Development, which funded the study. This articulates the vision to shift the balance between disease treatment, early detection, and prevention toward the latter two. For virtually all genetic diseases, patients do better when symptoms are found early and treated aggressively.

For diseases of children, recessive diseases are particularly important. We have two eyes, two ears, two kidneys, and so on, as a backup system in case something goes wrong with one. Many genes are similar. We carry two copies of each chromosome (except for the X and Y chromosomes). For most genes, if one copy is mutated, the other picks up the slack and there are no medical sequelae. For recessive diseases, these are conditions where each parent carries one mutated gene and a child by chance inherits both mutated copies of the same gene, and consequently has no intact un-mutated gene as a backup system. There are about 4,000 or so recessive genetic diseases, which are quite rare; when you add them all up, they affect approximately two percent of the population. An important rationale for whole-genome screening of every infant at birth is that even though most of the recessive diseases are individually rare, much of this pain and suffering falls disproportionately on our most valuable and vulnerable members, our children. Thus, lifelong benefit can be gained by treating and curing a childhood disease.

MCADD is one of the diseases for which newborns are screened from biochemical analyses of drops of blood taken from their heels shortly after birth. This disorder affects about one in ten thousand births. However, parents are not screened for it genetically because it is too rare to merit the cost of biochemically screening everyone.

Anne Morris, a colleague in the world of genetics, is one mother whose family was struck by genetic lightening and who could have potentially benefitted from more global pre-conception genetic screening. Anne was one of the one-to-two percent of women in the United States and Europe who use in vitro fertilization to have children. Her son was conceived with her eggs and a sperm donor. Sperm banks typically screen donors for more common genetic diseases, such as cystic fibrosis, but not for rare recessive diseases, again because of cost.

Both Anne and her son’s biological father carried a mutation in the gene ACADM, which causes the recessive disease medium-chain acyl-CoA dehydrogenase deficiency, or MCADD. ACADM is a metabolic gene that helps convert fats into sugar to provide energy for the body. Newborn babies develop low blood sugar, often after periods of being sick with infections and not drinking enough formula or breast milk. When undiagnosed, MCADD is a significant cause of infant sudden death, or seizures after a mild illness. However, diet and stress are significant disease modifiers. Consequently, many children with MCADD may remain with symptoms for long periods of time, until an illness or other stress causes them not to eat and their blood sugar gets too low. Children with MCADD can live long and full lives when they are diagnosed and are carefully followed by attentive family members and medical professionals to monitor the children’s diets, catch symptoms early, and begin drug therapy.

Anne was so impressed with the benefits of genetic diagnosis that she started GenePeeks, a genetic testing company in New York and Massachusetts, which focuses on high-quality, comprehensive, genetic testing for more than six hundred pediatric diseases at a reasonable cost for prospective parents.

The “baby genome” study is still ongoing, but with preliminary successful results has now been expanded into a broader group of Americans, with a more than $200-million Precision Medicine Initiative Cohort Program. The US President Obama, in his State of the Union Address on January 20, 2015, announced his support of the Precision Medicine Initiative. The goal of the Precision Medicine Initiative Cohort is “to bring us closer to curing diseases like cancer and diabetes, and to give all of us access to the personalized information we need to keep ourselves and our families healthier.” Similar to Genomics England, the Precision Medicine Initiative Cohort will build a large research cohort of one million or more Americans. Because the population of the United States is ethnically more diverse than the United Kingdom, an important goal is to include genetically less well-studied minority groups, such as African Americans, Hispanics, and Native Americans. Stated official goals include developing quantitative estimates of risk for a range of diseases by integrating environmental exposures, genetic mutations, and gene-environment interactions; identification of determinants of individual variation in efficacy and safety of commonly used therapeutics; and the discovery of biomarkers that identify people with increased or decreased risk of developing common diseases. Additionally, the use of mobile health technologies from smartphone and similar devices will be emphasized to collect large amounts of data on physical activity, patient location in different environments (e.g., urban vs. rural), physiological measures and environmental exposures with health outcomes; development new disease classifications and relationships; empowerment of participants with data and information to improve their own health; and the creation of a platform to identify patients to perform clinical trials of novel therapies. The Precision Medicine Initiative Cohort will commence recruiting patients in November 2016 from all geographic regions of the United States.

For all of these population genomics programs, there is great potential societal benefit worldwide. Patients can have significant diseases detected before symptoms arise or catastrophic consequences occur, such as blood-vessel rupture in persons who have connective tissue diseases. New medical diagnostic tests and drugs will be developed to diagnose and treat previously undiagnosed genetic conditions that are rationally matched to the specific underlying root biological defect rather than nonspecific later complications. There is the hope that improved knowledge of the genetics will also reduce health-care costs, enabling precision therapies instead of the one-size-fits-all treatments that are today’s norm.

However, for the individuals who enroll in population genomics studies, there are potential concerns. While the specifics depend on an individual’s nationality, a question frequently raised by patients in Europe and North America is whether having their genome sequenced will change the premiums that they pay for insurance. In the UK, health insurance is universal and patients do not pay premiums unless they seek out private networks with desired higher-quality services and shorter wait times. In the United States, health insurance coverage is not universal and people have to choose to enroll in specific plans. Additionally, there are other types of insurance that can be affected. In the UK, the US, and many countries in Europe, these include life insurance, critical illness (in the United States often called long-term care) insurance, and income protection (in the United States often called disability) insurance.

Most of the time, for these individuals, participating in the Genomics England 100,000 Genomes Project or Precision Medicine Initiatives, having their genomes sequenced will not impact premiums for medical (in the US) life, critical illness, and income protection insurance (for US, UK, and many European Union countries), as for now this is a research study and there are currently no requirements for patients to tell insurers that they have results of genetic testing.

Overall, for even healthy people with no obvious personal medical or family history consistent with genetic disease (like Ansel K.), more than 4.5 percent (and with full genome sequencing this will likely be higher) will have a clearly concerning genetic mutation that will directly impact their medical care in the present tense. Thus, an estimated 5,000–7,000 people or more from this project at the very least will have a new red-letter diagnosis in their medical record in Genomics England, and 50,000–70,000 patients in the Precision Medicine Initiative.

While patients do not have to disclose their 100,000 Genomes or Precision Medicine Initiative genome-sequencing test results, participants will have to respond truthfully to questions relating to any screening or preventative treatments. So, for example, if a woman is found to have a mutation in a gene increasing risk of colorectal cancer and is having annual colonoscopy to check for cancers, this will need to be disclosed and could influence coverage decisions. It is, of course, probable that an insurer could be alerted to closely examine an individual’s personal medical and family history for coverage decisions. Presumably, this could be particularly important for critical illness/long-term care and income protection/disability insurance premiums.

In the United States, in May 2008 President George W. Bush signed into law the Genetic Information Non-Discrimination Act, or GINA. The GINA law had been presented as a federal bill to Congress a full twelve years earlier, but it had previously been thwarted in different Congressional committees and sub-committees. The bill passed the US Senate 95–0 and the House of Representatives 414–1. The only nay vote was from Congressman Ron Paul, who is currently a conservative leader in the US Republican Party and a significant recipient of campaign contributions from the United States Insurance Lobby. GINA prohibits genetic discrimination in both employment and health coverage. Specifically, insurers and employers who have more than fifteen employees are not allowed to request genetic information to be used in any of their decisions. However, small businesses are exempt because of concerns over the cost of administration. The Genetic Alliance, a not-for-profit group dedicated to the responsible use of genetic information, called the GINA law a landmark in the history of genetics in the United States and in Civil Rights law. However, GINA has some limitations. It does not protect against discrimination involving life, disability, and long-term care insurance, similar to the situation in the UK.

The situation gets complicated in that at the individual state level, there can be laws that provide additional protections above and beyond those covered by the GINA law. A good resource for people who are considering participating in the Precision Medicine Initiative, or even for those having genetic testing in general, is the Council for Responsible Genetics, which provides state level information about specific protections and laws. For example, in the State of California, there is a law called Cal-GINA. Cal-GINA provides civil legal protection against genetic discrimination in life, long-term care, and disability insurance, as well as in health insurance, employment, housing, education, mortgage lending, and elections. The states of Vermont (supported by Presidential Democratic contender Bernie Sanders) and Oregon also have strong laws that broadly prohibit genetic diagnoses being used to inform long-term care, life, and disability insurance. However, for most states there is only at present coverage by the GINA law, for example, in the State of New York, where I practice genetic medicine.

This is not a theoretical but a real-life situation, and I have patients who have been denied life insurance because of a genetic diagnosis. For example, Karen Young is one patient in my practice who has Lynch syndrome. This is a genetic cancer susceptibility disease that increases the risk for colorectal, uterine, ovarian, and other cancers and is caused by mutations in MSH2, MLH1, and other genes. Karen is in her fifties and had surgery to remove her colon, uterus, and ovaries. She has never developed any cancer and is diligent to keep up with cancer screening in order to work and take care of her family. Despite her meticulous care to stay healthy, Karen was denied life insurance, blandly because of her genetic diagnosis. The denial letter from her insurer read:

Like all insurance companies, we have guidelines that determine when coverage can or cannot be provided. Unfortunately, after carefully reviewing your application, we regret that we are unable to provide you with coverage because of your positive finding of a mutation in the MSH2 gene, which causes the Lynch syndrome, as noted in your medical records.

If you received any correspondence prior to this letter that you interpret as coverage, please disregard it. You do not have coverage. Also, if you have an existing policy that you were replacing, please continue paying the premiums on that policy.

In summary, the concerns of patients such as Karen or Ansel are just now beginning to be heard. The large-scale population genomics research projects in the UK, the US (both public and private), and other countries have great potential to improve our ability to diagnose and, with time, develop new effective treatments for many individuals with a wide range of genetic diseases. As DNA sequencing costs have fallen dramatically, the number of people who could benefit from whole-genome sequencing has the potential to rise dramatically.

However, what is lagging is societal legal policies in the different countries that can protect the American public from insurance concerns and genetic discrimination, whether overt or covert. Because these large-scale research projects are progressing rapidly, the United States, the United Kingdom, and other nations performing similar projects urgently need more comprehensive legal protections for people who have had genome sequencing. We are not in a position to say, “It cannot happen here,” because genetic discrimination is in fact already happening around the world. Unless more comprehensive legal safeguards are put into place, there is the risk that a backlash will ensue that can limit the willingness of individuals to participate in these important scientific endeavors, to the detriment of our nations’ entire populations.