TIGRA_SV

Introduction

Tigra_sv is a program that conducts targeted local assembly of structural variants (SV) using the iterative graph routing assembly (TIGRA) algorithm (L. Chen, unpublished). It takes as input a list of putative SV calls and a set of bam files that contain reads mapped to a reference genome such as NCBI build36. For each SV call, it assembles the set of reads that were mapped or partially mapped to the region of interest (ROI) in the corresponding bam files. Instead of outputing a single consensus sequence, tigra_sv outputs all the alternative alleles in the ROI as long as they received sufficient sequence coverage (usually >= 2x). It is shown that tigra_sv is quite effective at improving the SV prediction accuracy in short reads analysis and can produce accurate breakpoint sequences that are valuable to understand the origin, mechanism and pathology underlying the SVs.

This is an early beta-release version 0.0.1.

Install

Since tigra_sv uses samtools C API. Please install samtools (http://samtools.sourceforge.net/). Please also modify the Makefile to link the compiler to the correct locations of samtools in your system.

Usage

./tigra_sv <SV file> <a.bam> <b.bam> ...


Or:

./tigra_sv <SV file> <bam_list_file>

 

Options:

-A INT Esimated maximal insert size [500]
-l INT Flanking size [500]
-w INT Pad local reference by additional [200] bp on both ends
-q INT Only assemble reads with mapping quality > [1]
-N INT Number of mismatches required to be tagged as poorly mapped [5]
-p INT Ignore cases that have average read depth greater than [1000]
-I STRING Read intermediate files from DIR instead of creating them
-Q INT Minimal BreakDancer score required for analysis [0]
-L STRING Ignore calls supported by libraries that contains (comma separated) STRING
-b Check when the format is in breakdancer
-r Whether write reference to a file with .ref.fa as the suffix
-d Whether to write dumped reads to a file with .fa as the suffix
-R Reference file location with the full path
-c STRING Specify chromosome for position 2 to parallelzing job

Input

The first argument is a SV prediction file. And the rest could either be a bam_list_file, with the format described below, or the full path of each bam file sepeparated by space starting from the second argument.

We currently support two kinds of input.

1. Population based SV assembly such as those in the 1000 Genomes project.

The SV prediction file should be in a tab-delimited format with the following columns:
CHR
START_OUTER
START_INNER
END_INNER
END_OUTER
TYPE_OF_EVENT
SIZE_PREDICTION
MAPPING_ALGORITHM
SEQUENCING_TECHNOLOGY
SAMPLEs
TYPE_OF_COMPUTATIONAL_APPROACH
GROUP
OPTIONAL_ID

It is critical to have accurate information in CHR,START_INNER,END_INNER,TYPE_OF_EVENT, SIZE_PREDICTION, and SAMPLEs.

SAMPLEs should be the sample names separated by comma.

For example:
1       829757  829757  829865  829865  DEL       116     MAQ     SLX     NA19238,NA19240    RP      WashU

To let the program know the location of the bam file and the sample name, the second argument should be a bam_list_file, the format of which should be sample_name:bam_file_location with no space in between.

For example:
NA19238:/gscuser/1000genomes/ftp/data/NA19238/alignment/NA19238.chrom1.SLX.SRP000032.2009_07.bam
And each row would be declaring one sample.

2. Individual sample based SV assembly, such as those in the tumor genome atlas (TCGA) that interrogates matched tumor/normal genomes.

The SV prediction should be in breakdancer format. Please use option -b to claim the input as a BreakDancer file. The rest of the arguments should be the bam files that are to be assembled.

Required TYPE format

We currently support the assembly of the following type of events
DEL: deletion;
INS: insertion;
ITX: tandem duplication;
CTX: transchromosomal translocation.

Notice that in the BreakDancer file, the types are already in the require format. But for the population based assembly, please make sure that the vacabulary in the TYPE column is from one of above four types, with three capital letters describing the type (DEL, INS, ITX, CTX).

Some notes about options

-R:  If you'd like to see if part of the contigs are novel relative to the reference, i.e., supported by unmapped or poorly mapped reads, please provide the program with the samtools faidxed reference file with -R option followed by the full path. The novel part of the contigs will be in CAPITAL letters, while the parts identical to the reference will be in lower case. This feature facilitates consistency analysis with split-reads type of algorithm (such as Pindel) that directly examines unmapped or poorly mapped reads.

-c:  If you'd like to parallelize the jobs by chromosome, please use option -c followed by the chromosome id (not with CHR), so that the program will skip the other chromosomes for this job. Please make sure that the bams in bam_list_file contains the chromosome of interest.

-r:  If you'd like to obtain a local reference file excised from the reference for comparison annotation purpose, please add the option -r. By default it is off.

-d:  If you'd like to dump the reads in separate files so they can be examined by other assemblers.

Results

The results will be written to the directory where you launched the program. So please give it enough space.

For each examined SV call, you will obtain two files:

chr1.start.chr2.end.type.size.orientation.fa.contigs.fa (contigs file by TIGRA)
chr1.start.chr2.end.type.size.orientation.fa.contigs.het.fa (alternative paths from TIGRA contigs)

in the running directory.

The first file is the basic contigs reported by TIGRA (similar to Velvet or Abyss), and the second, alternative contigs constructed from the contig graphs. The SV alleles could be in either file, depending on the alternative allele frequency.

If -r is checked, a local reference file will be created
chr1.start.chr2.end.type.size.orientation.ref.fa

If -d is checked, the reads will be saved in
chr1.start.chr2.end.type.size.orientation.fa

Example running command

/gscuser/tigra_sv washu_pilot2_trio_large_deletions.pcr pcr_filelist.txt


to run pcr format SV calls.

Or

/gscuser/tigra_sv -b breakdancer.sv /gscuser/tumor.bam /gscuser/normal.bam


to run breakdancer format SV calls.

Copyright © 1993-2012 Washington University in St. Louis. All rights reserved.

logo