Researchers produced the map using next-generation DNA sequencing technologies to systematically characterize human genetic variation in 180 people in three pilot studies. Moreover, the full scale-up from the pilots is already under way, with data collected from more than 1,000 people.
“The pilot studies of the 1000 Genomes Project laid a critical foundation for studying human genetic variation,” said Dr. Richard Durbin, of the Wellcome Trust Sanger Institute and co-chair of the consortium. “These proof-of-principle studies are enabling consortium scientists to create a comprehensive, publicly available map of genetic variation that will ultimately collect sequence from 2,500 people from multiple populations worldwide and underpin future genetics research.”
Genetic variation between people refers to differences in the order of the chemical units — called bases — that make up DNA in the human genome. These differences can be as small as a single base being replaced by a different one — which is called a single nucleotide polymorphism (abbreviated SNP) — or is as large as whole sections of a chromosome being duplicated or relocated to another place in the genome. Some of these variations are common in the population and some are rare. By comparing many individuals to one another and by comparing one population to other populations, researchers can create a map of all types of genetic variation.
The 1000 Genomes Project’s aim is to provide a comprehensive public resource that supports researchers as they study the genetic variation that might cause human disease. The project’s approach goes beyond previous efforts in capturing and integrating data on all types of variation, and freely releases data generated from numerous human populations with informed consent. Already, these data have been used in studies of the genetic basis for disease.
“By making data from the project freely available to the research community, it is already impacting research for both rare and common diseases,” said Dr. David Altshuler, deputy director of the Broad Institute of Harvard and MIT, and a co-chair of the project. “Biotech companies have developed genotyping products to test common variants from the project for a role in disease. Every published study using next-generation sequencing to find rare disease mutations, and those in cancer, used project data to filter out variants that might obscure their results.”
The project has studied populations with European, West African and East Asian ancestry. Using the newest technologies for sequencing DNA, the project's nine centers sequenced the whole genome of 179 people and the protein-coding genes of 697 people. Each region was sequenced several times, so that more than 4.5 terabases (4.5 million million base letters) of DNA sequence were collected. An international consortium involving multiple academic centers, including Washington University’s Genome Institute, and technology companies that developed the sequencing equipment carried out the work.
To process these data required many technical and computational innovations, including standardized ways to organize, store, analyze and share DNA sequencing data. Launched in 2008, the 1000 Genomes Project started with three pilot projects to develop, evaluate and compare strategies for producing a catalog of genetic variations. Funded through numerous mechanisms by foundations and national governments, the 1000 Genomes Project will cost some $120 million over five years, ending in 2012.
When the work began, sequencing was very expensive, so the project looked to alternative approaches such as combining partial data from many people. “We have shown for the first time that a new approach to sequencing — low coverage of many samples — works efficiently and well,” said Dr. Gil McVean, professor of statistical genetics at the University of Oxford. “This proof of principle is now being applied not only in the 1000 Genomes Project, but in disease research, as well.”
The project’s database contains more than 95 percent of the currently measurable variants found in any individual, and continuing work will eventually identify more than 99 percent of human variants.
“What really excites me about this project is the focus on identifying variants in the protein-coding genes that have functional consequences. These will be extremely useful for studies of disease and evolution,” said Dr. Richard Gibbs, director of the Human Genome Sequencing Center at the Baylor College of Medicine (another one of the project’s sequencing centers).
The improved map produced some surprises. For example, the researchers discovered that on average, each person carries between 250 and 300 genetic changes that would cause a gene to stop working normally, and that each person also carried between 50 and 100 genetic variations that had previously been associated with an inherited disease. No human carries a perfect set of genes. Fortunately, because each person carries at least two copies of every gene, individuals likely remain healthy, even while carrying these defective genes, if the second copy works normally.
In addition to looking at variants that are shared among many people, the researchers also investigated in detail the genomes of six people: two mother-father-daughter nuclear families. By finding new variants present in the daughter but not the parents, the team was able to observe the precise rate of mutations in humans, showing that each person has approximately 60 new mutations that are not in either parent.
With the completion of the pilot phase, the 1000 Genomes Project has moved into full-scale studies in which 2,500 samples from 27 populations will be studied over the next two years. Data from the pilot studies and the full-scale project are freely available on the project website, http://www.1000genomes.org.
Researchers studying specific illnesses, such as heart disease or cancer, use maps of genetic variation to help them identify genetic changes that may contribute to the illnesses. Over the last five years, the first generation of such studies (called genome-wide association studies, or GWAS) have been based on an earlier map of genetic variation called the HapMap. Built using older technology, HapMap lacks the completeness and detail of the 1000 Genomes Project.
“Once a disease-associated region of the genome is identified, experimental studies must be done to identify which variants, genes and regulatory elements cause the increased disease risk,” said Dr. Lisa Brooks, program director for the Genetic Variation Program at the National Human Genome Research Institute, a part of the National Institutes of Health. “With the new map, researchers can just look up all the candidate genes and almost all of the variants in the database, saving them many steps in finding the causes.”
Organizations that committed major support to the project include: 454 Life Sciences, a Roche company, Branford, Conn.; Life Technologies Corporation, Carlsbad, Calif.; BGI-Shenzhen, Shenzhen, China; Illumina Inc., San Diego; the Max Planck Institute for Molecular Genetics, Berlin, Germany; the Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK; and the National Human Genome Research Institute, Bethesda, Md., which supports the work being done by The Genome Institute at Washington University, St. Louis, Missouri; Baylor College of Medicine, Houston, Texas; and the Broad Institute, Cambridge, Mass. Researchers at many other institutions are also participating in the project including groups in Barbados, Canada, China, Colombia, Finland, the Gambia, India, Malawi, Pakistan, Peru, Puerto Rico, Spain, the UK, the US and Vietnam. Additional information about the project, including a list of all participants and organizations, can be found at http://www.1000genomes.org.