VERSION

    Version 1.02


SYNOPSIS

    VarScan is a tool for detecting variants from alignments of next-gen sequencing data.


USAGE varscan.pl [command]

COMMANDS:

easyrun [ALIGNMENTS] [OPTIONS]

    Perform all tasks (parse alignments, combine variants, get readcounts) in one step
        ALIGNMENTS: File of read alignments in Blat (PSLX), Bowtie, cross_match, Novoalign, or Newbler format (required).
                    See "FRONT END ALIGNMENT" section below for recommended alignment parameters.
        
        DETECTION OPTIONS:
        --fasta-file            File of read sequences in FASTA format (required for indel genotypes)
        --quality-file          File of read quality scores in FASTA format (required for base qualities)
        --ref-dir               Directory containing reference sequence FASTAs (required for indel genotypes)
        --min-align-score       Specifies a minimum BLAST-like alignment score (matches - mismatches - gaps) [25]
        --min-identity          Specifies a minimum sequence identity for alignments (matches / bases) [90]
        --primer-trim           Length of M13/MID primer tail at start of read; variants within ignored [0]
        --default-qual-score    Quality score to assign to bases when there is no quality information [15]
        FILTERING OPTIONS:
        --num-samples           Number of samples in the pool to auto-set the following parameters [1]
        --min-coverage          Minimum total coverage to call a variant [1]
        --min-reads2            Minimum variant-supporting reads [1]
        --min-avg-qual          Minimum variant base quality [0]
        --min-var-freq          Minimum variant allele frequency [0]
        --min-strands2          Minimum variant strands observed [1]
        OUTPUT OPTIONS:
         --output-dir           Output directory where results will be saved [./]
         --sample               Sample name to use as base for output files [sample]
         
         Filtered SNP calls will be output to [output_dir]/[sample].snps.combined.readcounts.filtered
         Filtered INDEL calls will be output to [output_dir]/[sample].indels.combined.readcounts.filtered
         
         Intermediate output files will include:
         [output_dir]/[sample].alignments                   Alignments for uniquely-placed reads meeting criteria
         [output_dir]/[sample].snps                         Individual read-level SNP calls
         [output_dir]/[sample].indels                       Individual read-level indel calls
         [output_dir]/[sample].snps.combined.readcounts     Unfiltered SNPs with read counts
         [output_dir]/[sample].indels.combined.readcounts   Unfiltered INDELs with read counts

parse-alignments [ALIGNMENTS] [OPTIONS]

    Parse alignments file, scores alignments, and detects sequence changes.
        ALIGNMENTS: File of read alignments in Blat (PSLX), Bowtie, cross_match, Novoalign, or Newbler format (required).    
        
        OPTIONS:
        --fasta-file            File of read sequences in FASTA format (required for indel genotypes)
        --quality-file          File of read quality scores in FASTA format (required for base qualities)
        --ref-dir               Directory containing reference sequence FASTAs (required for indel genotypes)
        --min-align-score       Specifies a minimum BLAST-like alignment score (matches - mismatches - gaps) [25]
        --min-identity          Specifies a minimum sequence identity for alignments (matches / bases) [90]
        --primer-trim           Length of M13/MID primer tail at start of read; variants within ignored [0]
        --default-qual-score    Quality score to assign to bases when there is no quality information [15]
        --min-qual-score        Minimum base quality score for variants to be called [15]
        --output-alignments     Output file to contain qualifying single best alignment for each read
        --output-snps           Output file to contain SNPs
        --output-indels         Output file to contain indels

combine-variants [VARIANTS] [OUTPUT]

    Combine variants (SNPs or indels) detected across multiple reads
        VARIANTS:   File of variants from alignment parsing
        OUTPUT:     Output file for combined variants

get-readcounts [VARIANTS] [ALIGNMENTS] [OUTPUT] [OPTIONS]

    Determine read counts supporting each allele
        VARIANTS:   File of *combined* variants
        
        ALIGNMENTS: Original alignments file
        OUTPUT:     Output file for variant read counts
        OPTIONS:
                    --fasta-file            File of read sequences in FASTA format (required for indel genotypes)
                    --quality-file          File of read quality scores in FASTA format (required for base qualities)
                    --ref-dir               Directory containing reference sequence FASTAs (required for indel genotypes)
                    --default-qual-score    Quality score to assign to bases when there is no quality information [15]
                    --min-qual-score        Minimum base quality score for variants to be called [15]

combine-readcounts [VARIANTS1,VARIANTS2] [OUTPUT] [OPTIONS]

    Combine information from multiple read counts files
        VARIANTS:   Read counts file, separated by commas
        
        OUTPUT:     Output file for combined variant read counts

filter-variants [VARIANTS] [OPTIONS]

    Filter variants based on coverage, read counts, allele frequency, quality, etc.
        VARIANTS:       File of variants from alignment parsing
        
        OPTIONS:        --output-file       File to contain variants passing filter
                        --min-coverage      Minimum total coverage [1]
                        --min-reads2        Minimum variant-supporting reads [1]
                        --min-avg-qual      Minimum variant base quality [0]
                        --min-var-freq      Minimum variant allele frequency [0]
                        --min-strands2      Minimum variant strands observed [1]

limit-snps [VARIANTS] [POSITIONS] [OUTPUT]

    Restricts SNP calls to a given set of chromosome positions
        VARIANTS:       File of variants from alignment parsing
        POSITIONS:      Tab-delimited file of chrom-positions at which to include variants
        OUTPUT:         Output file for variants at the provided positions


THE FRONT-END ALIGNMENT

    VarScan performance relies heavily on the accuracy of the read alignments.
    To obtain alignments in a format compatible with VarScan, our recommendations are as follows:
    
    BLAT:   Run with the -out=pslx parameter.  Give VarScan a single file with all PSLx alignments
    Newbler: Run with the -pairt parameter.  Give VarScan the 454PairAlign.txt file
    
    Bowtie: Run with the -m 1 parameter.  Give VarScan the Bowtie output file.
    
    cross_match: For 454 data, run with these parameters: -minmatch 12 -minscore 25 -penalty -4 -discrep_lists -tags -gap_init -3 -gap_ext -1
                 For Illumina data, run with these parameters: -minmatch 12 -minscore 25 -minmargin 1 -discrep_lists -gap1_only -tags
                 Give VarScan the cross_match output file.
    Novoalign: Run with the parameters: -a -t 120 .  Give VarScan the Novoalign output file.


AUTHOR

    Daniel C. Koboldt, << <dkoboldt at genome.wustl.edu> >>
    The Genome Center at Washington University School of Medicine
    St. Louis, Missouri, USA


COPYRIGHT

    Copyright 2009 Daniel C. Koboldt and Washington University
    All rights reserved.


LICENSE

    This program is free for non-commercial use.