|
Introduction
The ICAtools are a set of programs that could be of use to anyone doing medium-to-large scale DNA sequencing projects. By structuring otherwise amorphous sets of DNA sequences the ICAtools can quickly provide useful information that might otherwise lie undiscovered.
For example, when used for their primary task of providing guiding information about the efficient use of cDNA libraries, the programs can estimate the amount of redundancy (or conversely, normalization success) in a given library, and can predict the number of as yet unfound sequences that remain in any particular library. More generally the ICAtools can be used to discover when similar subsequences are present in any set of sequences, such as Alu repeats, or when vector or linker sequence have not been removed from otherwise disimilar sequences. The programs can list the names and descriptions of those sequences that contain shared subsequences and then display alignments that detail the nature and extent of what exactly any sequences have in common. The range of uses to which clustering can be put is vast and the ICAtools are not going to be perfect for every case, but often a combination of tools using different styles of clustering can reveal different kinds of sequence relationship that would not otherwise be apparent. Clustering ESTs works best with anchored reads. If your ESTs come from all over your mRNA sequences, then you have a much harder problem to solve perfectly and you will probably need an assembly program such as Gap, or Phrap though the ICAtools can still be used for imperfect, relative cDNA library comparisons.
Note that the ICAtools do not work properly with base ambiguity symbols. Use the UNIX sed command to change them to 'n' or 'N'. The tools can be adjusted to specialized areas of inquiry, or to speed them up, by changing run-time and compile-time constants. Check the individual program descriptions and study the programs' source.
|