Improved tools for DNA comparison and clustering.

Comput Appl Biosci. 1995 Dec;11(6):603-13.


DNA sequence clustering is an effective aid of the comprehension, summarization and compression of DNA sequence databases. Previous work created programs suitable for the comparison and clustering of cDNA sequences but new enhanced programs have been written to cluster genomic DNA fragments, large EST projects, and entire DNA databases. Three new programs (ICAtools) are discussed: ICAass, N2tool, and ICAmatches. ICAass has been used to compress the EMBL database by hiding or removing sequences with various degrees of redundancy. It also has the fastest database querying mode. N2tool provides fast and sensitive clustering of genomic fragment databases on the basis of small areas of local similarity. N2tool has proven utility in the discovery of contaminating vector or other artefactual sequence when the potential contaminant is not otherwise known. ICAmatches is a new cluster analysis program that uses a novel alignment style to present multiple alignment summaries. All the tools are convenient to use because they share a common memory-frugal index format and accept most DNA sequence formats directly.


Parsons JD.

Institute Authors