Analysis and Annotation Tool for Finding Genes in Genomic Sequences


The AAT email server at Michigan Tech identifies genes in a DNA sequence by comparing the query sequence against cDNA and protein sequence databases:
(1) Human_Gene_Index, a database of human cDNA sequences at TIGR,
(2) dbEST, a database of EST sequences at NCBI,
(3) SwissProt, a database of protein sequences at University of Geneva,
(4) nr, a database of non-redundant protein sequences at NCBI.

The AAT package includes two sets of programs, one set (DPS/NAP) for comparing the query sequence with a protein database, and the other (DDS/GAP2) for comparing the query with a cDNA database (Huang et al., 1997). Each set contains a fast database search program and a rigorous alignment program. The database search program quickly identifies regions of the query sequence that are similar to a database sequence. Then the alignment program constructs an optimal alignment for each region and the database sequence. The alignment program also reports the coordinates of exons in the query sequence.

To reduce the number of undesirable matches due to interspersed repeats, the DNA sequence is screened for interspersed repeats using the RepeatMasker program (Smit and Green, unpublished results). The masked DNA sequence is used for database searching, and the unmasked DNA sequence for sequence alignment, which allows the alignment program to identify the exact coordinates of exons even if parts of the exons are masked.

References

Huang, X., Adams, M.D., Zhou, H. and Kerlavage, A.R. (1997). A tool for analyzing and annotating genomic sequences, submitted.

Four sequence databases can be searched (one at a time) by sending a specially formatted mail message containing the query to the AAT server. A search is then performed against the specified database and the results are returned in an email message.

The programs in the AAT tool can also be used through a Web server. To look at the AAT Web server, click here.

1. Accessing the AAT Email Server

To access the server, send an electronic mail message containing the query formatted as described below to the following Internet address:

aat@cs.mtu.edu

2. Obtaining Help

To receive this set of instructions on using the AAT email server, send a mail message to:

aat@cs.mtu.edu

Put the word 'HELP' on a single line in the body of the mail message.

3. The Search Format

The search input begins with the mandatory search parameter 'DATABASE' followed by the name of the database to be searched on the first line. There must be a space between 'DATABASE' and the database name. The database names accepted by AAT are: hgi, dbest, swissprot, nr.

A DNA query sequence in FASTA format must be specified on the remaining lines. That is, the second line begins with the symbol '>' followed by the name of the query sequence. The query sequence is on the remaining lines. The query sequence must not contain blanks. The query sequence could be in upper or lower case. Only one sequence can be specified. The sequence has to be a DNA sequence. Below is a search input example:


DATABASE  swissprot
>DNA sequence
GCCCCCGGCCCCGCCCCGGCCCCGCCCCCGGCCCCGCCCCGCAAGGGTC
ACAGGTCACGGGGCGGGGCCGAGGCGGAAGCGCCCGCAGCCCGGTACCGG
CTCCTCCTGGGCTCCCTCTAGCGCCTTCCCCCCGGCCCGACTCCGCTGGT
CAGCGCCAAGTGACTTACGCCCCCGACCTCTGAGCCCGGACCGCTAGGCGA
GGAGGATCAGATCTCGCTCGAGAATCTGAAGGTGCCCTGGTCCTGGAGG
AGTTCCGTCCCAGCCCGCGGTCTCCCGGTACTGTCGGGCCCCGGCCCTCT
GGAGCTTCAGGAGGCGGCCGTCAGGGTCGGGGAGTATTTGGGTCCGGGGT
CTCAGGGAAGGGCGGCGCCTGGGTCTGCGGTATCGGAAAGAGCCTGCTGG
AGCCAAGTAGCCCTCCCTCTCTTGGGACAGACCCCTCGGTCCCATGTCCA
TGGGGGCACCGCGGTCCCTCCTCCTGGCCCTGGCTGCTGGCCTGGCCGTT
GCCCGTCCGCCCAACATCGTGCTGATCTTTGCCGACGACCTCGGCTATGG
GGACCTGGGCTGCTAGGGCACCCCAGCTCTACCACTCCCAACCTGGACC
AGCTGGCGGCGGGGGGCTGCGGTTCACAGACTTCTACGTGCCTGTGTCT
CTGTGCACACCCTCTAGGTAAAGAGGGGGCCGCGCCTCTTCCCCGCCCCG
ATCCCTCCATCCCTTTCCTCCCAATGGATTGCAGGGGGGCGGGAAAAACGT
CTGTCTCTCTCTCTAGGGAAGGCCACATTTCTGTCTGTCTCAGGGACTCT
GTGACTTGTCCCGCAGGGCCGCCCTCCTGACCGGCCGGCTCCCGGTTCGG
ATGGGCATGTACCCTGGCGTCCTGGTGCCCAGCTCCCGGGGGGGCCTGCC
CCTGGAGGAGGTGACCGTGGCCGAAGTCCTGGCTGCCCGAGGCTACCTCA
CAGGAATGGCCGGCAAGTGGCACCTTGGGGTGGGGCCTGAGGGGGCCTTC
CTGCCCCCCCATCAGGGCTTCCATCGATTTCTAGGCATCCCGTACTCCCA
CGACCAGGTAGGAACCACCCGGGCCCTCAGCCACCCTCCCACCTCCCAAA
GTCCCCCAGCCCTTGATGCTCCCGCAGCCCCACCTGCCAGCCCAGCCCTC
ACGGCAGCTGCCCGCCTCAGGGCCCCTGCCAGAACCTGACCTGCTTCCCG
CCGGCCACTCCTTGCGACGGTGGCTGTGACCAGGGCCTGGTCCCCATCCC
ACTGTTGGCCAACCTTCCGTGGAGGCGCAGCCCCCCTGGCTGCCCGGAC
TAGAGGCCCGCTACTATGGCTTTCGCCCATGACCTCATGGCCGACGCCCAG
CGCCAGGATCGCCCCTTCTTCCTGTACTATGCCTCTCACGTAAGTGATCT
TGGCCCAACCCCCTGGCTGCCCGTGACCCCTACCCAGTGCTAACTCCAGT
CTTTGCCCCCAGCACACCCACTACCCTCAGTTCAGTGGGCAGAGCTTTGC
AGAGCGTTCAGGCCGCGGGCCATTTGGGGACTCCCTGATGGAGCTGGATG
CAGCTGTGGGGACCCTGATGACAGCCATAGGGGACCTGGGGCTGCTTGAA
GAGACGCTGGTCATCTTCACTGCAGACAATGGGTATGCCAGCAGGGCAGC
TGGGTGCTCCGGCCCTGTCACGGGCCAGGGCCCTGGAGGCCTTGCAGTTC
AGCTGCTTGCCAAGATCATAGTGGGTGAGGGGGTGCCAGGAGATGCTGGC
CACGTTGCAGGGGCCCAAGGTGTAGTCAGGAGACACAGTGCACAGAGAGC
TGGTCTTGGTAGGCCTGGGAGGTGCCGGGCTCATGCTGGGCACCTCCGGG
CAAGCTTTGTGACTTAGAGGTGTGGGGCCACTGGTCACCCTCGGTGGCTC
AGAGGCTGTGGCTCCTGGCTCATGAGCGCCTCCTGTGTCCCAGACCTGA
GACCATGCGTATGTCCCGAGGCGGCTGCTCCGGTCTCTTGCGGTGTGGAA
AGGGAACGACCTACGAGGGCGGTGTCCGAGAGCCTGCCTTGGCCTTCTGG
CCAGGTCATATCGCTCCCGGTCAGTCCGCAGGCCCTCTCCTTGGAACCCT
GGCCCCACCACCCCAACCTTGATGGCGAACTGAGTGACTGACCAGCCTCC
TGCCCCCAGGCGTGACCCACGAGCTGGCCAGCTCCCTGGACCTGCTGCCT
ACCCTGGCAGCCCTGGCTGGGGCCCCACTGCCCAATGTCACCTTGGATGG
CTTTGACCTCAGCCCCCTGCTGCTGGGCACAGGCAAGGTAGGGCCGGTGA
CCCCTGATCCCAGATCCTTGGCCCCTGTCCTGGCCTTCCCCTGGGGTGAG
TGTGGGCAGTGCCTGAGAGTCTGTGCCTCAGTGCCTCCTGCACTGAGTGG
CATCCAAGTGGCGCCTACCTCTCAGGTTCCTGGGTGGGCAAGAAGCGGTGC
ACGTCCAGGGCCTCCCACCAGGGCTGGCAGCCCCAGGTATGTGCAGTGCT
TGGGGCCTGCCCCGCCCCGTGACCCCTGACTCTGCCCCCAGAGCCCTCGG
CAGTCTCTCTTCTTCTACCCGTCCTACCCAGACGAGGTCCGTGGGGTTTT
TGCTGTGCGGACTGGAAGTACAAGGCTCACTTCTTCACCCAGGGTAACC
CCTCCCCGTGGATCCCTCCCCCCGACCTGCTGACCCCTCCCCGGAGCCCT
AGATCCCTGGCCCCTCCTCTCGCCCTTGCCCTGTGCACAGAATTGGCCCC
CTCCCCAGGCTCTGCCCACAGTGATACCACTGCAGACCCTGCCTGCCACG
CCTCCAGCTCTCTGACTGCTCATGAGCCCCCGCTGCTCTATGACCTGTCC
AAGGACCCTGGTGAGAACTACAACCTGCTGGGGGGTGTGGCCGGGGCCAC
CCCAGAGGTGCTGCAAGCCCTGAAACAGCTTCAGCTGCTCAAGGCCCAGT
TAGACGCAGCTGTGACCTTCGGCCCCAGCCAGGTGGCCCGGGGCGAGGAC
CCCGCCCTGCAGATCTGCTGTCATCCTGGCTGCACCCCCCGCCCAGCTTG
CTGCCATTGCCCAGATCCCCATGCCTGAGGGCCCCTCGGCTGGCCTGGGC
ATGTGATGGCT