Taeniopygia guttata Sequence Assembly Release Notes The zebra finch DNA for shotgun sequencing, and for BAC and cosmid libraries, derived from a single male (Black 17) domesticated zebra finch from the laboratory of Arthur P. Arnold in the Department of Physiological Science at UCLA, Los Angeles, CA, USA. The parents of this male hatched in the same clutch in an aviary of group-housed zebra finches, and therefore may have been brother-sister. A male BAC library was constructed from the same bird by Barbara Blackmon at the Clemson University Genomics Institute (this library is NOT the same as the BAC library available from the Arizona Genome Institute) made from several individual females. The initial assembly was generated using PCAP (Huang et al., 2006) from ~6X coverage in whole-genome shotgun reads, a combination of plasmid, fosmid and bacterial artificial chromosome (BAC)-end read pairs. The sequence of 35 finished BAC clones were incorporated into the final assembly. The T. guttata physical map contains 108,725 clones for a ~10X depth of coverage and is contained in 2,724 contigs. Of the 1.2 Gb genome, 1.0Gb was ordered and oriented along 33 zebra finch chromosomes and 3 linkage groups. An additional 36 Mb was localized to specific chromosomes or linkage groups, but was not ordered and oriented. For the initial PCAP assembly, there were 92,299 major contigs (126,053 total contigs) with an N50 contig length of 39kb (n=8,037). There were 37,252 major supercontigs (37,698 total supercontigs) with the N50 supercontig length of 10.4Mb (n=29). The zebra finch chromosomes were named based on their homologous chromosomes in Gallus gallus. For those chromosomes where multiple zebra finch chromosomes correspond to a single chicken chromosome, a letter was appended to the chromosome name. The lookup table can be found below for cross-referencing the Gallus gallus homologous names with the current naming convention for the zebra finch. All unanchored supercontigs have been concatenated into chromosome "chrUn", separated by gaps of 25 bp. On all other chromosomes, unknown gap sizes between supercontigs have been set to 100 bp. AGP Generation Details To create chromosomal sequences, data from the Sheffield Linkage Map and the physical map were integrated with the WGS assembly data. Using sequence comparison, T. guttata SNP marker sequences were assigned to contigs (contiguous stretches of DNA) in the WGS assembly. Based on these marker assignments, the supercontigs (sets of ordered/oriented contigs linked by virtue of read pairing data) were assigned to a chromosome based on a majority rule (>50% of markers assigned to the same chromosome). The supercontigs were initially positioned along chromosomes based on their median marker position, and initially oriented based on relative marker order along the supercontig. The physical map was also linked to the sequence assembly by using BAC end sequence links and in silico digests of the assembly to create "ultracontigs", ordered/oriented lists of "supercontigs". Following these initial placements, the WGS assembly read pairing data were used, where possible, to aid in orientation and confirm order. For the Z chromosome, marker order was also determined by FISH (Art Arnold, personal commuication) and integrated again with the linkage map, physical map and assembly. All discrepancies betwen the various maps were manually reviewed and a combined super/ultracontig order was established based on reconciling the data from the Sheffield, assembly and physical maps. Available EST data were also used in reviewing the assembly. Alignments with the chicken genome were also examined and used as aid in orientation particularly when available other zebra finch-specific data were inconclusive. The location of the centromere is known only for the Z chromosome. Thus no other centromeres were placed in the current chromosomal assemblies. Cross-reference of zebra finch chromosome names used for this release, chicken and finch chromosome name suggested by Itoh et al, 2005*. TGU GGA Itoh et al., 2005 Tgu1 1 3 Tgu1A 1 4 Tgu1B 1 NA Tgu2 2 1 Tgu3 3 2 Tgu4 4 5 Tgu4A 4 microchromosome Tgu5 5 6 Tgu6 6 7 Tgu7 7 8 Tgu8 8 9 Tgu9 9 10 Tgu10 10 NA Tgu11 11 NA Tgu12 12 NA Tgu13 13 NA Tgu14 14 NA Tgu15 15 NA Tgu16 16 NA Tgu17 17 NA Tgu18 18 NA Tgu19 19 NA Tgu20 20 NA Tgu21 21 NA Tgu22 22 NA Tgu23 23 NA Tgu24 24 NA Tgu25 25 NA Tgu26 26 NA Tgu27 27 NA Tgu28 28 NA TguLGE22 LGE22C19W28_E50C23 NA TguLGE22A LGE22C19W28_E50C23 NA Tgun2 NA NA Tgun5 NA NA TguZ Z Z *********************************************************************************** Taeniopygia guttata Sequence and Assembly Credits DNA source - Art Arnold, Department of Physiological Science, UCLA Genome Sequence - The Genome Center, Washington University School of Medicine Sequence Assembly and Chromosomal Sequence Construction - The Genome Center, Washington University School of Medicine Zebra finch linkage map - Jessica Stapley, Tim Birkhead, Terry Burke and Jon Slate, Department of Animal & Plant Sciences, University of Sheffield, Sheffield, UK Z Map/FISH Mapping - Itoh Yuichiro and Art Arnold, Department of Physiological Science, UCLA Fingerprint Map - The Genome Center, Washington University School of Medicine T. guttata Finished Clones - The Genome Center, Washington University School of Medicine BAC library - Barbara Blackmon, Clemson University Genomics Institute (CUGI) Fosmid Library - The Genome Center, Washington University School of Medicine and Lucigen Corporation, Middleton, Wisconsin Finch EST data - David Clayton, W.M. Keck Center for Comparative and Functional Genomics at the University of Illinois at Urbana-Champaign, and The Genome Center, Washington University School of Medicine Funding for the sequence characterization of the zebra finch genome is being provided by the National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH). The T. guttata sequence is made freely available to the community by The Genome Center, Washington University School of Medicine, with the following understanding: 1. The data may be freely downloaded, used in analyses, and repackaged in databases. 2. Users are free to use the data in scientific papers analyzing particular genes and regions from these data if the providers of this data (The Genome Center, Washington University School of Medicine, and the International Zebra finch Sequencing and Analysis Consortium) are properly acknowledged. Any whole genome analyses of this assembly should consult The Genome Center, Washington University School of Medicine, and the International Zebra finch Sequencing and Analysis Consortium prior to publication. 3. Any redistribution of the data should carry this notice. *Itoh Y, Arnold AP (2005) Chromosomal polymorphism and comparative painting analysis in the zebra finch. Chromosome Res. 2005;13(1):47-56. *** SIMPLE READ STATS *** Total input reads: 12988460 Total input bases: 10657162288 bp Total Q20 bases: 8429388296 bp Average Q20 bases per read: 649 bp Average read length: 821 bp Placed reads: 12657512 (reads in scaffold: 11837360) (reads in singleton: 820152) Unplaced reads: 330948 Chaff rate: 2.55% Q20 base redundancy: 6.3x Total prefin reads input: 279 Total prefin reads unused: 3 *** Contiguity: Contig *** Total contig number: 126053 Total contig bases: 1224525252 bp Total Q20 bases: 1197969478 bp Q20 bases %: 97.8% Average contig length: 9714 bp Average major (> 2000 bp) contig length: 12768 Maximum contig length: 424635 bp N50 contig length: 38549 bp N50 contig number: 8037 Major contig (> 2000 bp) number: 92299 Major_contig bases: 1178496641 bp Major_contig Q20 bases: 1155793110 bp Major_contig Q20 base percent: 98.1% Top tier (up to 900000000 bp): Contig number: 21151 Average length: 42551 bp Longest length: 424635 bp Contig bases in this tier: 900002090 bp Q20 bases in this tier: 887183057 bp Q20 base percentage: 98.5% Top tier N50 contig length: 57749 bp Top tier N50 contig number: 4606 Middle tier (900000000 bp -- 1200000000 bp): Contig number: 83348 Average length: 3599 bp Longest length: 11441 bp Contig bases in this tier: 300001510 bp Q20 bases in this tier: 288579601 bp Q20 base percentage: 96.1% Middle tier N50 contig length: 3941 bp Middle tier N50 contig number: 24235 Bottom tier (1200000000 bp -- end): Contig number: 21554 Average length: 1138 bp Longest length: 1528 bp Contig bases in this tier: 24521652 bp Q20 bases in this tier: 22206820 bp Q20 base percentage: 90.5% Bottom tier N50 contig length: 1228 bp Bottom tier N50 contig number: 8904 *** Contiguity: Supercontig *** Total supercontig number: 37698 Average supercontig length: 32482 bp Maximum supercontig length: 56620707 bp N50 supercontig length: 10409499 bp N50 supercontig number: 29 Major supercontig (> 2000 bp) number: 37252 Major_supercontig bases: 1223725179 bp Major_supercontig Q20 bases: 1197282719 bp Major_supercontig Q20 base percent: 97.8% Scaffolds > 1M: 132 Scaffold 250K--1M: 49 Scaffold 100K--250K: 64 Scaffold 10--100K: 3100 Scaffold 5--10K: 8699 Scaffold 2--5K: 25182 Scaffold 0--2K: 472 Top tier (up to 900000000 bp): Supercontig number: 78 Average length: 11544469 bp Longest length: 56620707 bp Contig bases in this tier: 900468551 bp Q20 bases in this tier: 884996675 bp Q20 base percentage: 98.2% Top tier N50 supercontig length: 16225236 bp Top tier N50 supercontig number: 16 Middle tier (900000000 bp -- 1200000000 bp): Supercontig number: 26792 Average length: 11197 bp Longest length: 3213145 bp Contig bases in this tier: 300000509 bp Q20 bases in this tier: 290386571 bp Q20 base percentage: 96.7% Middle tier N50 supercontig length: 26915 bp Middle tier N50 supercontig number: 407 Bottom tier (1200000000 bp -- end): Supercontig number: 10828 Average length: 2222 bp Longest length: 2551 bp Contig bases in this tier: 24056192 bp Q20 bases in this tier: 22586232 bp Q20 base percentage: 93.8% Bottom tier N50 supercontig length: 2242 bp Bottom tier N50 supercontig number: 4940 *** Fosmid Coverage *** Total fosmids assembled in scaffold: 702580 Total length of these reads: 646072624 Paired reads: 555942 (79.1%) Estimated fosmid coverage (total fosmids Q20 bases over genome): 0.41% *** BES Coverage *** Total BES assembled in scaffold: 396722 Total length of these reads: 357829671 Paired reads: 338936 (85.4%) Estimated BES coverage (total fosmids Q20 bases over genome): 0.41%