Advances in Genome Biology and Technology Meeting 2012

February 15-18, 2012
Marco Island, Florida

The Advances in Genome Biology and Technology (AGBT) meeting is an annual scientific forum for acquiring information about the latest advances in DNA sequencing technologies and applications.

Below is a list of selected posters from The Genome Institute presented at this year's meeting.

Selected Posters:

The Genome Modeling System: A Turnkey Genomics Analysis Platform

David J. Dooling, Scott Smith, Ben Oberkfell, Justin Lolofie, Matt Callaway, Nathan Nutter, Brian Derickson, Tom Mooney, Joshua McMichael, James Eldred, Jason Walker, David Larson, Nathan Dees, Chris Harris, Dan Koboldt, William Schierding, Chris Miller, Cyriac Kandoth, George M. Weinstock, Elaine R. Mardis, and Richard K. Wilson.

DDooling_AGBT_2012_ThumbAs the number and complexity of subjects in medical genomics projects continue to increase, managing sample and project information is becoming just as crucial as executing the analysis pipeline. We have developed an integrated analysis information management system, called the Genome Modeling System (GMS), for managing subject data, analysis execution, and results visualization for genomics research projects. Using web-based entry methods, investigators can enter and track individuals and their associated tissue samples, sequencing libraries, sequencing instrument data, and analysis progress and results. All data, including ad hoc user annotation, is indexed into a full-text search engine for easy look up and retrieval. The system also supports viewing the data in tabular format on the web and exporting to spreadsheet formats for sharing internally or with collaborators.

GMS is distributed as a virtual machine image based on Ubuntu Linux, allowing an investigator to immediately begin working with the tools with minimal system administration and bioinformatics expertise. The virtual machine image includes popular genomics software, much of it never officially packaged for the Ubuntu platform, including BWA, VarScan, SAMtools, Picard, Bio::DB::Sam, BreakDancer, and MuSiC. All software is packaged using the native Ubuntu package management system and is served from The Genome Institute’s package repository, allowing facile, efficient upgrades as new versions of tools and the framework are released. Documentation and installation instructions for GMS are available at http://gmt.genome.wustl.edu/. View the poster (pdf, 2.9 Mb).

 

Variant Validation, Extension, and Interpretation Methods at The Genome Institute at Washington University

Robert Fulton, Ryan Demeter, Vincent Magrini, Michael McLellan, Daniel Koboldt, Li Ding, Todd Wylie, Michelle O’Laughlin, Rachel Maupin, Elaine R. Mardis, and Richard K. Wilson.

RFulton_AGBT_2012_Thumb.jpgWith the ever-increasing throughput of next generation sequencing, variant validation is increasingly critical to understanding the mutational spectrum of the sequenced genomes. Validation provides confirmation of putative variant calls, thus helping to improve variant calling algorithms. In addition to confirmation of putative calls, the validation process provides a deeper understanding of variant frequency, and helps with the interpretation of the impact of the variation. For somatic mutations, variant frequencies provide clues to tumor purity, clonality, and help identify likely driver events, or variants critical to the progression or metastasis of this disease.

These methods not only provide validation, but also can be used to extend putative variants across other samples, to identify commonly mutated genes across sample panels. This presentation will outline validation/extension methods and decision processes utilized for large and small-scale variant confirmation. View the poster (pdf, 2.8 Mb).

 

Automated Profiling of Small RNA Molecules in Acute Myeloid Leukemia Using High-Throughput Next Generation Sequencing

Jasreet Hundal1,3, Todd Wylie1,3, Vincent Magrini1, Jason Walker1, Maria Trissal2, Sean D. McGrath1, Jessica Silva1, Giridharan Ramsingh2, Todd A. Fehniger2, Daniel Link2, Timothy J. Ley1,2,  Richard K. Wilson1, and Elaine R. Mardis.1

1The Genome Institute,2 Department of Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA, 3These authors contributed equally to this work.

Jhundal_AGBT_2012_ThumbSmall non-coding RNAs (sncRNAs)—e.g., miRNAs, snoRNAs, piRNAs—can have large-scale and diverse effects on cellular processes by regulating gene expression, protein translation, and genomic organization. There is accumulating evidence that alterations in expression of sncRNAs contribute to human disease. Next generation sequencing (NGS) provides a high-throughput platform for exploring sncRNA populations in samples derived from healthy and diseased individuals.

We have developed an in-house automated pipeline designed to profile and compare reads derived from NGS sncRNA libraries. Our pipeline focuses on three main areas: 1) identification/abundance of previously known sncRNAs; 2) discovery/abundance of putatively novel sncRNAs; 3) tracking of differential expression of sncRNAs between multiple library types, tissues, and/or states. The pipeline locates areas of contiguous alignment in the genome, forming ab initio "clusters" representing sncRNAs. Cluster candidates undergo adaptor trimming, quality filtering, annotation interrogation, coverage modeling, normalized expression calculation, and sncRNA species fractionation into bins based on associated read lengths.

As trial applications of our pipeline, we defined the microRNAomes in a patient with acute myeloid leukemia (AML) and also in Natural Killer (NK) cells of Mus musculus. Our current focus expands assessment of sncRNAs beyond, but inclusive of, miRNA lengths in healthy and neoplastic human tissues. As such, we characterized the small RNA transcriptome in leukemic blasts from 22 patients with AML and CD34+ bone marrow cells from 8 healthy individuals [updated]. RNA species ranging from 17-75 nts were identified. In both AML and normal CD34+ cells, snoRNAs were the most abundant sncRNAs identified, followed by miRNAs. However, a large fraction of sequence reads (30%) mapped to unannotated regions of the genome; size fractionation of these reads suggests most of the novel sncRNAs are not miRNAs. We further identified 16 significantly expressed differentially regulated miRNAs and 38 differentially regulated snoRNAs when comparing control CD34+ cells to AML samples [current as of 02/08/2012]. View the poster (pdf, 3.3 Mb).

 

Integrative Genomic Analysis Methods for Large-Scale Cancer Sequencing Studies

Daniel C. Koboldt, Dong Shen, Mike McLellan, Li Ding, Elaine R. Mardis, Richard K. Wilson, and The Cancer Genome Atlas Network.

DKoboldt_AGBT_2012_ThumbIdentification of recurrent genetic events driving tumor development and progression is a key goal of cancer genomics. We have developed robust methods for the detection of somatic mutations, germline variants, copy number alterations, and loss of heterozygosity (LOH) events in WGS and exome data. Here, we apply our methods to 507 invasive breast carcinomas that we have characterized as part of The Cancer Genome Atlas (TCGA). We detected over 30,000 somatic coding mutations (~60 per tumor), as well as extensive LOH and copy number changes. Integrating mutation, copy number and clinical data revealed striking differences in the landscape of somatic alterations between the five major expression subtypes of breast cancer. View the poster (pdf, 2.2 Mb).

 

Combinatorial Data Sets: Pragmatic Applications Derived from Multiple Sequencing Technologies

Vincent Magrini, Jason Walker, Todd Wylie, Sean McGrath, Amy Ly, Jasreet Hundal, Ryan Demeter, Laura Gottschalk, Khaing Soe, Nathan Sander, Lisa Cook, Erica Sodergren, Wes Warren, George Weinstock, Richard K. Wilson, and Elaine R. Mardis.

VMagrini_AGBT_2012_ThumbTo explore the benefits of combining next generation sequencing data sets, we constructed various libraries from the bacterium Enterococcus faecalis str. TX0309B and generated a suite of read types. In particular, we used Illumina-based paired-end, mate-pair, and overlapping reads, Ion Torrent and 454 FLX+ WGS reads, and Pacific Biosciences circular consensus reads (CCS) to support error correction of Pacific Biosciences Continuous Long Reads (CLR). In combination, CLRs provide long-range linking information into E. faecalis assembly generated from fragment and/or paired-end reads. View the poster (pdf, 3.7 Mb).

 

Copyright © 1993-2012 Washington University in St. Louis. All rights reserved.

logo