Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

Genome Res. 2017 Apr 10. doi: 10.1101/gr.213611.116.


The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.


Schneider VA1, Graves-Lindsay T2, Howe K3, Bouk N1, Chen HC1, Kitts PA1, Murphy TD1, Pruitt KD1, Thibaud-Nissen F1, Albracht D2, Fulton RS2, Kremitzki M2, Magrini V2, Markovic C2, McGrath S2, Steinberg KM2, Auger K3, Chow W3, Collins J3, Harden G3, Hubbard T3, Pelan S3, Simpson JT3, Threadgold G3, Torrance J3, Wood JM3, Clarke L4, Koren S5, Boitano M6, Peluso P6, Li H7, Chin CS6, Phillippy AM5, Durbin R3, Wilson RK2, Flicek P4, Eichler EE8,9, Church DM1.