PolyScan

Introduction

Small insertions and deletions (indels) and Single Nucleotide Polymorphisms (SNPs) are common genetic variants that may be associated with a wide variety of diseases. Owing to the genome's size and complexity, manually characterizing each one of these variations in an individual currently is not practical. While significant progress has been made in automated single-base mutation discovery from the sequences of diploid PCR products, automated and reliable detection of indels continues to pose difficult challenges to detection and characterization. Here, we present PolyScan, a software tool specially designed to provide de novo indel detection and SNP identification in the context of high-throughput medical sequencing. Rather than using the "poly" summary files computed by phred as bases of analysis. PolyScan starts the analysis from the raw chromatograms and re-basecall signals in the each of the fluorescence channels. This feature makes PolyScan applicable to both polymorphism and mutation discovery, which requires analyzing of small intensity signals. Alignment and Bayesian probabilistic methods are used to compute probabilities of SNPs as well as small and medium-sized indels (< 100bp). Variants are detected based on statistical analysis of evidences in the entire assembly including those from other comparable traces. In addition, some heuristic filtering methods are implemented to reduce the influence of PCR and sequencing artifacts based on our knowledge from manual analysis.

PolyScan is written in C++ and is capable of analyzing large-scale projects (thousands of reads) in a relatively short amount of time (usually less than a hour) on a linux workstation. Staden library is used as part of PolyScan to provide input/output for analyzing chromatogram files in scf format. PolyScan fits naturally into the phred/phrap/Consed pipeline and has demonstrated comparable or superior performance than other programs in our study especially in its ability in identifying low level mutations and its flexibility in achieving good sensitivity/specificity tradeoffs. A detailed description of methodology and performance of PolyScan v2.0 in comparison with other programs are available in our paper.

Diagram

Publication

Ken Chen*, Michael D. McLellan, Li Ding, Michael C. Wendl, Yumi Kasai, Richard K. Wilson, Elaine R. Mardis, PolyScan: an Automatic Indel and SNP Detection Approach to the Analysis of Human Re-sequencing Data, Genome Research, published 6 April 2007, 10.1101/gr.6151507

Download

The executables of PolyScan are currently available for the following platforms:

Contact

For technical questions, please contact the first author: Ken Chen

Copyright © 1993-2012 Washington University in St. Louis. All rights reserved.

logo