When the Human Genome Project was completed in 2003 (two years ahead of schedule), many thought that meant the entire human genome had been fully sequenced. In reality there were sections of the sequence that remained unfinished, containing gaps, missing information and misaligned or misrepresented regions. To improve the sequence quality and accuracy of the human assembly, the Genome Reference Consortium (GRC) was created. The group – which is made up of the McDonnell Genome Institute and a number of other international organizations – also took on the task of improving the reference genomes of other model organisms, including the mouse and zebrafish, which are often used as models for human disease.
“Once the human genome sequence was released to the public, researchers began to write in with comments about inaccuracies, gaps and other problems with the assembly,” says Ms. Tina Graves-Lindsay, Group Leader for Reference Genomes at the MGI. “The McDonnell Genome Institute and other groups felt there was a need to address these issues by performing the additional experimental and computational work needed to solve these problems.”
Any whole human genome that is sequenced and much of the genomic disease research going on today makes use of this human reference. This is why having the correct sequence is exceedingly important. For example, any mistake in the human reference could lead to the misinterpretation of a key change to the genome of a patient with a disease such as cancer. “With an incomplete assembly, there is potentially useful data that is lost. We want to make sure we have as complete a reference as possible to avoid any mistakes in interpreting that data,” says Ms. Graves-Lindsay. She adds that it is also important to improve our understanding and representation of the variation present in the genome across all populations, as this will greatly enhance genomic analysis and disease research.
Improving the human reference
Some of the problems associated with sequencing the human genome include hard-to-sequence repetitive or variable regions, improperly assembled areas and regions where no sequence exists at all. The GRC’s collective expertise in genome mapping, sequencing and informatics is helping to correct such issues. GRC is closing remaining gaps and providing alternative assemblies, where needed, to reflect an increased understanding of genomic structure and variation. They also work to improve the reference so that shorter next generation sequences can be aligned more easily, which will help with data interpretation.
The consortium is very open to collaboration. They encourage scientists worldwide to report their own sequencing issues and inaccuracies, which are then systematically prioritized and reviewed. The GRC has created a centralized database and established standard tools and operating procedures to make this reporting process as efficient as possible. As improvements are made to the sequence, the group then periodically releases updated reference assemblies to the public. More frequently released ‘patches’ allow users timely access to improvements without significantly changing the assembly and its annotation and mapping coordinates.
By improving the accuracy of the human sequence in such a variety of ways, the GRC is making available the best possible human reference to scientists around the world. This, in turn, is helping to advance genomic research as it moves toward the clinic and is leading to a better understanding of human health and disease.
The Genome Reference Consortium is made up of the following members:
- McDonnell Genome Institute
- National Center for Biotechnology Information (NCBI)
- European Bioinformatics Institute (EBI)
- Wellcome Trust Sanger Institute