This directory contains assemblies and other organism-specific data. This data is freely available but please observe the WU GSC data policy (contained in the file DATA_POLICY in this directory) if you download, use, or publish on the data. DIRECTORY HIERARCHY ------------------- 1. Organism Class The top level of this directory structure contains a directory for each organism class or group the WU GSC has sequenced. The following classifications are present: Primates Primate genomic data Other Vertebrates Non-primate vertebrate genomic data Invertebrates Invertebrate genomic data Plants Plant genomic data Fungi Fungal genomic data Microbes Microbial genomic data Othere Single Celled Organisms Other single celled genomic data 2. Organism Each of the above directories contains a subdirectory for each organism of that type. The name of each of the organism directories follows the format Genus_species. 3. Genomic Data Each of the organism directories contains the following subdirectories (if data of the given type is available for that organism): assembly assembly data end_sequences paired-end sequence data genes predicted gene sequences map fingerprint map data 3.1 Assembly Data The assembly directory contains a directory for each assembly available for the organism. Each assembly has its own version number of the format M.N and the directory name has the format Genus_species-M.N. All assemblies with the same value of M were created from the same set of genomic data but the assembly parameters, pre- and/or post- processing was different. Each assembly directory contains a ASSEMBLY file which describes the assembly in detail. For more information on the contents of an assembly directory see the README_ASSEMBLY file. Assemblies that have been submitted to the NCBI and can be downloaded from GenBank using their accession numbers. 3.2 End Sequences This directory contains FASTA files of the BAC and/or fosmid-end sequences. 3.3 Genes (Predicted Gene Sequences) This directory contains either CDS sequence, or gene peptide sequence, or possibly both if available. It is for the Human Gut Microbiome project only. 3.4 Map Data This directory contains the FPC files for the organism. Only the most current version is available.