The Drosophila simulans genome project involved light WGS sequencing of seven highly inbred strains; white501, C167.4, New Caledonia, Inbred4, Inbred6, MD199S and MD106TS, with on strain, w501, receiving higher coverage than the others Embryo DNA was isolated from the w501 strain and sequenced to a depth of 3.8X (3.1X in plasmids plus 0.7X in fosmids). The reads were assembled using PCAP (Genome Res. 13(9):2164-70 2003) for an assembled coverage of 2.43X in plasmids and 0.48X in fosmids, for a total coverage of 2.91X. Chromosomal assignments made by alignment to the D. melanogaster genome (release 4.0), incorporating inversions defined by the assembly and confirmed by comparison to polytene chromosome band location. This assembly can be retrieved from GenBank (http://www.ncbi.nlm.nih.gov/) under the accession AAGH01000000. DNA was isolated from adults for five of the other six strains. For one strain, MD199S, DNA was isolated only from adult females to facilitate assembly of the D. simulans Y chromosome. Each strain was sequenced to a depth ~1X and assembled using PCAP. A D. simulans assembly representing a mosaic of several different D. simulans lines was constructed. The assembly process began with a ~3X WGS assembly of the the D. simulans w501 line. The w501 contigs were initially anchored, ordered and oriented by alignment with the D. melanogaster genome. The assembly was then examined for places where the w501 assembly suggested inversions with respect to the D. melanogaster assembly. One major inversion was found, confirming the already documented inversion found by Lemeunier and Ashburner (Proc R Soc Lond B Biol Sci. 1976). Six other ~1X coverage D. simulans lines (c167.4, md106ts, md199s, nc48s, sim4, and sim6) were assembled. Using the 4X WGS assembly of the w501 genome as a scaffold, contigs and unplaced reads from the 1X assemblies of the other individual strains were used to cover gaps in the w501 assembly where possible. Thus the resulting assembly is a mosaic containing the w501 contigs as the primary scaffolding with contigs and unplaced reads from the other lines filling gaps in the w501 assembly. For answers to questions about this assembly or project, or any other GSC genome project, please visit our Genome Groups web page (http://genome.wustl.edu/genome_group_index.cgi) and email the designated contact person. Funding for the sequence characterization of the Drosophila simulans genome was provided by the National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH). Production Sequencing statistics: project: Drosophila_simulans_w501-1.1 ** SIMPLE READ STATS *** total reads input : 637022 total reads placed: 576740 total reads unplaced: 60282 chaff rate : 0.09 total input phred20 bases = 437057720 total contig length added sum (estimated genome size) = 125683891 phred20 sequence redundancy: 3.48 X total contig number: 31198 maximum contig length: 66623 major contig (> 1kb) number: 27815 total supercontig number: 10765 maximum supercontig length: 4729806 major supercontig (> 1kb) number: 8242 contig N50 length:7074, contig N50 number:4981 supercontig N50 length:793303, supercontig N50 number:35 total GC counts in the genome: 53364753 total AT counts in the genome: 72313667 total NX counts in the genome: 5471 total mate pairs forward reverse constraints: 310177 total unsatisfied constraints excluding due to singleton, short supercontig, and supercontigs end: 7360 total unsatisfied rate: 2.37 % No. of satisfied constraints in contigs: 134487 No. of unsatisfied in distance in contigs: 2671 No. of satisfied links in scaffolds: 93894 No. of unsatisfied in dist. in scaffolds: 5304 No. of unsatisfied due to singlets: 41049 No. of unsatisfied due to short scaffolds: 23762 No. of unsatisfied due to scaffold ends: 1650 No. of other unsatisfied constraints: 7360 No. of redundant constraints: 0 Total no. of satisfied constraints: 228381 Total no. of unsatisfied constraints: 81796 Total no. of constraints: 310177