The Advances in Genome Biology and Technology (AGBT) meeting is an annual scientific forum for acquiring information about the latest advances in DNA sequencing technologies and applications.
Below is a list of selected posters from The Genome Institute presented at this year's meeting.
David J. Dooling, Scott Smith, Ben Oberkfell, Justin Lolofie, Matt Callaway, Nathan Nutter, Brian Derickson, Tom Mooney, Joshua McMichael, James Eldred, Jason Walker, David Larson, Nathan Dees, Chris Harris, Dan Koboldt, William Schierding, Chris Miller, Cyriac Kandoth, George M. Weinstock, Elaine R. Mardis, and Richard K. Wilson.
As the number and complexity of subjects in medical genomics projects continue
to increase, managing sample and project information is becoming just as crucial
as executing the analysis pipeline. We have developed an integrated analysis information
management system, called the Genome Modeling System (GMS), for
managing subject data, analysis execution, and results visualization for genomics
research projects. Using web-based entry methods, investigators can enter and
track individuals and their associated tissue samples, sequencing libraries, sequencing
instrument data, and analysis progress and results. All data, including ad
hoc user annotation, is indexed into a full-text search engine for easy look up and
retrieval. The system also supports viewing the data in tabular format on the web
and exporting to spreadsheet formats for sharing internally or with collaborators.
GMS is distributed as a virtual machine image based on Ubuntu Linux, allowing an investigator to immediately begin working with the tools with minimal system administration and bioinformatics expertise. The virtual machine image includes popular genomics software, much of it never officially packaged for the Ubuntu platform, including BWA, VarScan, SAMtools, Picard, Bio::DB::Sam, BreakDancer, and MuSiC. All software is packaged using the native Ubuntu package management system and is served from The Genome Institute’s package repository, allowing facile, efficient upgrades as new versions of tools and the framework are released. Documentation and installation instructions for GMS are available at http://gmt.genome.wustl.edu/. View the poster (pdf, 2.9 Mb).
Robert Fulton, Ryan Demeter, Vincent Magrini, Michael McLellan, Daniel Koboldt, Li Ding, Todd Wylie, Michelle O’Laughlin, Rachel Maupin, Elaine R. Mardis, and Richard K. Wilson.
With
the ever-increasing throughput of next generation sequencing, variant
validation is increasingly critical to understanding the mutational
spectrum of the sequenced genomes. Validation provides confirmation of
putative variant calls, thus helping to improve variant calling
algorithms. In addition to confirmation of putative calls, the
validation process provides a deeper understanding of variant frequency,
and helps with the interpretation of the impact of the variation. For
somatic mutations, variant frequencies provide clues to tumor purity,
clonality, and help identify likely driver events, or variants critical
to the progression or metastasis of this disease.
These methods not only provide validation, but also can be used to extend putative variants across other samples, to identify commonly mutated genes across sample panels. This presentation will outline validation/extension methods and decision processes utilized for large and small-scale variant confirmation. View the poster (pdf, 2.8 Mb).
Jasreet Hundal1,3, Todd Wylie1,3, Vincent Magrini1, Jason Walker1, Maria Trissal2, Sean D. McGrath1, Jessica Silva1, Giridharan Ramsingh2, Todd A. Fehniger2, Daniel Link2, Timothy J. Ley1,2, Richard K. Wilson1, and Elaine R. Mardis.1
1The Genome Institute,2 Department of Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA, 3These authors contributed equally to this work.
Small
non-coding RNAs (sncRNAs)—e.g., miRNAs, snoRNAs, piRNAs—can have
large-scale and diverse effects on cellular processes by regulating gene
expression, protein translation, and genomic organization. There is
accumulating evidence that alterations in expression of sncRNAs
contribute to human disease. Next generation sequencing (NGS) provides a
high-throughput platform for exploring sncRNA populations in samples
derived from healthy and diseased individuals.
We have developed an in-house automated pipeline designed to profile and compare reads derived from NGS sncRNA libraries. Our pipeline focuses on three main areas: 1) identification/abundance of previously known sncRNAs; 2) discovery/abundance of putatively novel sncRNAs; 3) tracking of differential expression of sncRNAs between multiple library types, tissues, and/or states. The pipeline locates areas of contiguous alignment in the genome, forming ab initio "clusters" representing sncRNAs. Cluster candidates undergo adaptor trimming, quality filtering, annotation interrogation, coverage modeling, normalized expression calculation, and sncRNA species fractionation into bins based on associated read lengths.
As trial applications of our pipeline, we defined the microRNAomes in a patient with acute myeloid leukemia (AML) and also in Natural Killer (NK) cells of Mus musculus. Our current focus expands assessment of sncRNAs beyond, but inclusive of, miRNA lengths in healthy and neoplastic human tissues. As such, we characterized the small RNA transcriptome in leukemic blasts from 22 patients with AML and CD34+ bone marrow cells from 8 healthy individuals [updated]. RNA species ranging from 17-75 nts were identified. In both AML and normal CD34+ cells, snoRNAs were the most abundant sncRNAs identified, followed by miRNAs. However, a large fraction of sequence reads (30%) mapped to unannotated regions of the genome; size fractionation of these reads suggests most of the novel sncRNAs are not miRNAs. We further identified 16 significantly expressed differentially regulated miRNAs and 38 differentially regulated snoRNAs when comparing control CD34+ cells to AML samples [current as of 02/08/2012]. View the poster (pdf, 3.3 Mb).
Daniel C. Koboldt, Dong Shen, Mike McLellan, Li Ding, Elaine R. Mardis, Richard K. Wilson, and The Cancer Genome Atlas Network.
Identification
of recurrent genetic events driving tumor development and progression
is a key goal of cancer genomics. We have developed robust methods for
the detection of somatic mutations, germline variants, copy number
alterations, and loss of heterozygosity (LOH) events in WGS and exome
data. Here, we apply our methods to 507 invasive breast carcinomas that we have characterized as part of The Cancer Genome Atlas (TCGA). We
detected over 30,000 somatic coding mutations (~60 per
tumor), as well as extensive LOH and copy number changes. Integrating
mutation, copy number and clinical data revealed striking differences in
the landscape of somatic alterations between the five major expression
subtypes of breast cancer. View the poster (pdf, 2.2 Mb).
Vincent Magrini, Jason Walker, Todd Wylie, Sean McGrath, Amy Ly, Jasreet Hundal, Ryan Demeter, Laura Gottschalk, Khaing Soe, Nathan Sander, Lisa Cook, Erica Sodergren, Wes Warren, George Weinstock, Richard K. Wilson, and Elaine R. Mardis.
To explore the benefits of combining next generation sequencing data sets, we constructed various libraries from the bacterium Enterococcus faecalis str. TX0309B and generated a suite of read types. In particular, we
used Illumina-based paired-end, mate-pair, and overlapping reads, Ion
Torrent and 454 FLX+ WGS reads, and Pacific Biosciences circular
consensus reads (CCS) to support error correction of Pacific Biosciences
Continuous Long Reads (CLR). In combination, CLRs provide long-range
linking information into E. faecalis assembly generated from fragment and/or paired-end reads. View the poster (pdf, 3.7 Mb).