Information on Pairwise Sequence Alignment Programs



NOTE: If the computation takes more than a few minutes, we suggest that the user obtain results via email by choosing the email option and providing an email address. Otherwise, an error may occur due to connection timeout.

FASTA Format:
The first line begins with the symbol '>' followed by the name of the sequence. The sequence is on the remaining lines. The sequence must not contain blanks. The sequence could be in upper or lower case. Below is an example sequence in FASTA format:
>DNA sequence GCCCCCGGCCCCGCCCCGGCCCCGCCCCCGGCCCCGCCCCGCAAGGGTC ACAGGTCACGGGGCGGGGCCGAGGCGGAAGCGCCCGCAGCCCGGTACCGG CTCCTCCTGGGCTCCCTCTAGCGCCTTCCCCCCGGCCCGACTCCGCTGGT CAGCGCCAAGTGACTTACGCCCCCGACCTCTGAGCCCGGACCGCTAGGCGA GGAGGATCAGATCTCGCTCGAGAATCTGAAGGTGCCCTGGTCCTGGAGG AGTTCCGTCCCAGCCCGCGGTCTCCCGGTACTGTCGGGCCCCGGCCCTCT

Loading a large sequence into the server:
The server allows the user to load a sequence into the server by providing the name of the sequence file. The server requires that the sequence file, its parent directory, its grant parent directory, ... and the home directory be all readable by the world. One simple way for meeting this requirement is to move the sequence file into the home directory and make both the file and the home directory readable by the world. To load the sequence file into the server, start the Netscape at the home directory, click the "Browse" button, and provide your file name.

SIM:
The SIM program finds k best non-intersecting local alignments between two sequences. The two sequences must be of the same type, that is, both are DNA sequences or both are protein sequences. Using dynamic programming techniques, SIM is guaranteed to find optimal alignments. The alignments are reported in order of similarity score, with the highest scoring alignment first. The k best alignments share no aligned pairs. SIM requires space proportional to the sum of the input sequence lengths and the output alignment lengths. Thus SIM can handle sequences of tens of thousands of base pairs. SIM is described in the following papers:
Huang, X. and Miller, W. (1991) A Time-Efficient, Linear-Space Local Similarity Algorithm. Advances in Applied Mathematics 12, 337-357.
Huang, X., Hardison, R. C. and Miller, W. (1990) A Space-Efficient Algorithm for Local Similarities. Computer Applications in the Biosciences 6, 373-381.

GAP:
The GAP program computes an optimal global alignment of two sequences without penalizing terminal gaps. A long gap in the shorter sequence is given a constant penalty. The two sequences must be of the same type, that is, both are DNA sequences or both are protein sequences. GAP delivers the alignment in linear space, so long sequences can be aligned. GAP is described in the following paper:
Huang, X. (1994) On Global Sequence Alignment. Computer Applications in the Biosciences 10, 227-235.

NAP:
The NAP program computes a global alignment of a DNA sequence and a protein sequence without penalizing terminal gaps. NAP handles frameshifts and long introns in the DNA sequence. The program delivers the alignment in linear space, so long sequences can be aligned. It makes use of splice site consensuses in alignment computation. Both strands of the DNA sequence are compared with the protein sequence and one of the two alignments with the larger score is reported. NAP is described in the following paper:
Huang, X. and Zhang, J. (1996) Methods for comparing a DNA sequence with a protein sequence, Computer Applications in the Biosciences 12(6), 497-506.

LAP2:
The LAP2 program finds k best non-intersecting local alignments between a DNA sequence and a protein sequence. LAP2 handles frameshifts and long introns in the DNA sequence. It delivers alignments in linear space, so long sequences can be aligned. It makes use of splice site consensuses in alignment computation. The reference information for LAP2:
Zhou, H., Joshi, C. P. and Huang, X. (1997). A local alignment algorithm for comparing a DNA sequence with a protein sequence, in preparation.

GAP2:
The GAP2 program computes an optimal global alignment of a genomic sequence and a cDNA sequence without penalizing terminal gaps. A long gap in the cDNA sequence is given a constant penalty. The GAP2 program makes use of splice site consensuses in alignment computation. GAP2 delivers the alignment in linear space, so long sequences can be aligned. The reference information for GAP2:
Huang, X. (1994) On Global Sequence Alignment. Computer Applications in the Biosciences 10, 227-235.

Suggestions/Comments

Please contact Xiaoqiu Huang at huang@mtu.edu