Analysis and Annotation Tool for Finding Genes in Genomic Sequences
The AAT email server at Michigan Tech identifies genes in a DNA sequence
by comparing the query sequence against cDNA and protein sequence databases:
(1) Human_Gene_Index, a database of human cDNA sequences at TIGR,
(2) dbEST, a database of EST sequences at NCBI,
(3) SwissProt, a database of protein sequences at University of Geneva,
(4) nr, a database of non-redundant protein sequences at NCBI.
The AAT package includes two sets of programs,
one set (DPS/NAP) for comparing the query sequence with a protein database,
and the other (DDS/GAP2) for comparing the query with a cDNA database
(Huang et al., 1997).
Each set contains a fast database search program and
a rigorous alignment program. The database search program
quickly identifies regions of the query sequence that
are similar to a database sequence. Then the alignment program
constructs an optimal alignment for each region and the database sequence.
The alignment program also reports the coordinates of exons in the query sequence.
To reduce the number of undesirable matches due to
interspersed repeats, the DNA sequence is screened for interspersed repeats
using the
RepeatMasker
program (Smit and Green, unpublished results).
The masked DNA sequence is used for database searching, and
the unmasked DNA sequence for sequence alignment, which
allows the alignment program to identify the exact coordinates of exons
even if parts of the exons are masked.
References
Huang, X., Adams, M.D., Zhou, H. and Kerlavage, A.R. (1997).
A tool for analyzing and annotating genomic sequences, submitted.
Four sequence databases can be searched (one at a time)
by sending a specially formatted mail message containing the query to the
AAT server. A search is then performed against the specified database
and the results are returned in an email message.
The programs in the AAT tool can also be used through a Web server.
To look at the AAT Web server,
click here.
1. Accessing the AAT Email Server
To access the server, send an electronic mail message containing the
query formatted as described below to the following Internet address:
aat@cs.mtu.edu
2. Obtaining Help
To receive this set of instructions on using the AAT email server,
send a mail message to:
aat@cs.mtu.edu
Put the word 'HELP' on a single line in the body of the mail message.
3. The Search Format
The search input begins with the mandatory search parameter 'DATABASE'
followed by the name of the database to be searched on the first line.
There must be a space between 'DATABASE' and the database name.
The database names accepted by AAT are: hgi, dbest, swissprot, nr.
A DNA query sequence in FASTA format must be specified
on the remaining lines. That is, the second line begins with
the symbol '>' followed by the name of the query sequence.
The query sequence is on the remaining lines.
The query sequence must not contain blanks.
The query sequence could be in upper or lower case.
Only one sequence can be specified. The sequence has to be a DNA sequence.
Below is a search input example:
DATABASE swissprot
>DNA sequence
GCCCCCGGCCCCGCCCCGGCCCCGCCCCCGGCCCCGCCCCGCAAGGGTC
ACAGGTCACGGGGCGGGGCCGAGGCGGAAGCGCCCGCAGCCCGGTACCGG
CTCCTCCTGGGCTCCCTCTAGCGCCTTCCCCCCGGCCCGACTCCGCTGGT
CAGCGCCAAGTGACTTACGCCCCCGACCTCTGAGCCCGGACCGCTAGGCGA
GGAGGATCAGATCTCGCTCGAGAATCTGAAGGTGCCCTGGTCCTGGAGG
AGTTCCGTCCCAGCCCGCGGTCTCCCGGTACTGTCGGGCCCCGGCCCTCT
GGAGCTTCAGGAGGCGGCCGTCAGGGTCGGGGAGTATTTGGGTCCGGGGT
CTCAGGGAAGGGCGGCGCCTGGGTCTGCGGTATCGGAAAGAGCCTGCTGG
AGCCAAGTAGCCCTCCCTCTCTTGGGACAGACCCCTCGGTCCCATGTCCA
TGGGGGCACCGCGGTCCCTCCTCCTGGCCCTGGCTGCTGGCCTGGCCGTT
GCCCGTCCGCCCAACATCGTGCTGATCTTTGCCGACGACCTCGGCTATGG
GGACCTGGGCTGCTAGGGCACCCCAGCTCTACCACTCCCAACCTGGACC
AGCTGGCGGCGGGGGGCTGCGGTTCACAGACTTCTACGTGCCTGTGTCT
CTGTGCACACCCTCTAGGTAAAGAGGGGGCCGCGCCTCTTCCCCGCCCCG
ATCCCTCCATCCCTTTCCTCCCAATGGATTGCAGGGGGGCGGGAAAAACGT
CTGTCTCTCTCTCTAGGGAAGGCCACATTTCTGTCTGTCTCAGGGACTCT
GTGACTTGTCCCGCAGGGCCGCCCTCCTGACCGGCCGGCTCCCGGTTCGG
ATGGGCATGTACCCTGGCGTCCTGGTGCCCAGCTCCCGGGGGGGCCTGCC
CCTGGAGGAGGTGACCGTGGCCGAAGTCCTGGCTGCCCGAGGCTACCTCA
CAGGAATGGCCGGCAAGTGGCACCTTGGGGTGGGGCCTGAGGGGGCCTTC
CTGCCCCCCCATCAGGGCTTCCATCGATTTCTAGGCATCCCGTACTCCCA
CGACCAGGTAGGAACCACCCGGGCCCTCAGCCACCCTCCCACCTCCCAAA
GTCCCCCAGCCCTTGATGCTCCCGCAGCCCCACCTGCCAGCCCAGCCCTC
ACGGCAGCTGCCCGCCTCAGGGCCCCTGCCAGAACCTGACCTGCTTCCCG
CCGGCCACTCCTTGCGACGGTGGCTGTGACCAGGGCCTGGTCCCCATCCC
ACTGTTGGCCAACCTTCCGTGGAGGCGCAGCCCCCCTGGCTGCCCGGAC
TAGAGGCCCGCTACTATGGCTTTCGCCCATGACCTCATGGCCGACGCCCAG
CGCCAGGATCGCCCCTTCTTCCTGTACTATGCCTCTCACGTAAGTGATCT
TGGCCCAACCCCCTGGCTGCCCGTGACCCCTACCCAGTGCTAACTCCAGT
CTTTGCCCCCAGCACACCCACTACCCTCAGTTCAGTGGGCAGAGCTTTGC
AGAGCGTTCAGGCCGCGGGCCATTTGGGGACTCCCTGATGGAGCTGGATG
CAGCTGTGGGGACCCTGATGACAGCCATAGGGGACCTGGGGCTGCTTGAA
GAGACGCTGGTCATCTTCACTGCAGACAATGGGTATGCCAGCAGGGCAGC
TGGGTGCTCCGGCCCTGTCACGGGCCAGGGCCCTGGAGGCCTTGCAGTTC
AGCTGCTTGCCAAGATCATAGTGGGTGAGGGGGTGCCAGGAGATGCTGGC
CACGTTGCAGGGGCCCAAGGTGTAGTCAGGAGACACAGTGCACAGAGAGC
TGGTCTTGGTAGGCCTGGGAGGTGCCGGGCTCATGCTGGGCACCTCCGGG
CAAGCTTTGTGACTTAGAGGTGTGGGGCCACTGGTCACCCTCGGTGGCTC
AGAGGCTGTGGCTCCTGGCTCATGAGCGCCTCCTGTGTCCCAGACCTGA
GACCATGCGTATGTCCCGAGGCGGCTGCTCCGGTCTCTTGCGGTGTGGAA
AGGGAACGACCTACGAGGGCGGTGTCCGAGAGCCTGCCTTGGCCTTCTGG
CCAGGTCATATCGCTCCCGGTCAGTCCGCAGGCCCTCTCCTTGGAACCCT
GGCCCCACCACCCCAACCTTGATGGCGAACTGAGTGACTGACCAGCCTCC
TGCCCCCAGGCGTGACCCACGAGCTGGCCAGCTCCCTGGACCTGCTGCCT
ACCCTGGCAGCCCTGGCTGGGGCCCCACTGCCCAATGTCACCTTGGATGG
CTTTGACCTCAGCCCCCTGCTGCTGGGCACAGGCAAGGTAGGGCCGGTGA
CCCCTGATCCCAGATCCTTGGCCCCTGTCCTGGCCTTCCCCTGGGGTGAG
TGTGGGCAGTGCCTGAGAGTCTGTGCCTCAGTGCCTCCTGCACTGAGTGG
CATCCAAGTGGCGCCTACCTCTCAGGTTCCTGGGTGGGCAAGAAGCGGTGC
ACGTCCAGGGCCTCCCACCAGGGCTGGCAGCCCCAGGTATGTGCAGTGCT
TGGGGCCTGCCCCGCCCCGTGACCCCTGACTCTGCCCCCAGAGCCCTCGG
CAGTCTCTCTTCTTCTACCCGTCCTACCCAGACGAGGTCCGTGGGGTTTT
TGCTGTGCGGACTGGAAGTACAAGGCTCACTTCTTCACCCAGGGTAACC
CCTCCCCGTGGATCCCTCCCCCCGACCTGCTGACCCCTCCCCGGAGCCCT
AGATCCCTGGCCCCTCCTCTCGCCCTTGCCCTGTGCACAGAATTGGCCCC
CTCCCCAGGCTCTGCCCACAGTGATACCACTGCAGACCCTGCCTGCCACG
CCTCCAGCTCTCTGACTGCTCATGAGCCCCCGCTGCTCTATGACCTGTCC
AAGGACCCTGGTGAGAACTACAACCTGCTGGGGGGTGTGGCCGGGGCCAC
CCCAGAGGTGCTGCAAGCCCTGAAACAGCTTCAGCTGCTCAAGGCCCAGT
TAGACGCAGCTGTGACCTTCGGCCCCAGCCAGGTGGCCCGGGGCGAGGAC
CCCGCCCTGCAGATCTGCTGTCATCCTGGCTGCACCCCCCGCCCAGCTTG
CTGCCATTGCCCAGATCCCCATGCCTGAGGGCCCCTCGGCTGGCCTGGGC
ATGTGATGGCT