Background: The chimpanzee genome (Pan troglodytes) genome was originally sequenced to 4X coverage using a male captive born chimp of West Africa origin known as "Clint" from the Yerkes Primate Research Center (Atlanta, USA). The revised assembly (Pan_troglodytes-2.1) described below represents an additional 2X whole genome shotgun plasmid reads which were generated as part of our sequence improvement plan for the existing 4X chimp assembly (Pan_troglodytes-1.0). As before, the combined sequence reads were assembled using the PCAP software (Genome Res. 13(9):2164-70 2003) and filtered for all known non-chimp sequence contaminants. Finished chromosome 21 sequence data was kindly provided by Todd Taylor and the Riken Genome Sciences Center (Watanabe et al., DNA sequence and comparative analysis of chimpanzee chromosome 22. Nature. 2004 May 27;429(6990):382-8. PMID: 15164055) to complete chimp "Clint" chromosome 21 and the chromosome Y sequence was finished at the Washington University Genome Sequencing Center with detailed mapping and extensive collaboration with David Page's group at the Whitehead Institute (The DNA Sequence of Chimpanzee Chromosome Y, unpublished; Hughes et al., Conservation of Y-linked genes during human evolution revealed by comparative sequencing in chimpanzee. Nature, 2005 437:100-3; PMID: 16136134). Ongoing sequence improvement efforts at WUGSC will act to further enhance this most recent 6X draft assembly with the expectation that another improved assembly will be available in the future. For questions regarding this chimp assembly please visit our existing chimp genome web page and contact the designated person for chimp. Downloads of the sequence data are available via our genome browser FTP server. Funding for the sequence characterization of the chimp genome was provided by the National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH). Production Sequencing statistics: project: chimp_051011_2.1 total reads input: 34014260 total reads placed: 29898661 total reads unplaced: 4115599 chaff rate: 0.12 total input phred20 bases: 19699021002 estimated genome size: 3000000000 total contig length added sum: 3160935199 phred20 bases sequence redundance: 6.57 X total input reads bases: 25812564886 average phred20 per read: 579.14 average read length: 758.87 Sequence assembly statistics: total contig number: 505703 maximun contig length: 288547 major contig (> 1kb) number: 359644 total supercontig number: 275931 maximun supercontig length: 44063386 major supercontig (> 1kb) number: 146124 based on the added contigs sum contig N50 length:26384, contig N50 number: 32093 supercontig N50 length:7679112, supercontig N50 number: 100 total GC counts in the genome: 1287378131 percentage: 0.41 total AT counts in the genome: 1873012645 percentage: 0.59 total NX counts in the genome: 544423 total mate pairs forward reverse constraints: 16387973 total unsatisfied constraints excluding due to singlton, short supercontig, and supercontigs end: 239522 total unsatisfied rate: 1.46 % No. of satisfied constraints in contigs: 8810215 No. of unsatisfied in distance in contigs: 238737 No. of satisfied links in scaffolds: 3314497 No. of unsatisfied in dist. in scaffolds: 315304 No. of unsatisfied due to singlets: 2536945 No. of unsatisfied due to short scaffolds: 685377 No. of unsatisfied due to scaffold ends: 48880 No. of other unsatisfied constraints: 239522 No. of redundant constraints: 198496 Total no. of satisfied constraints: 12124712 Total no. of unsatisfied constraints: 4064765 Total no. of constraints: 16387973