Almost in any bioinformatics process you are going to use an alignment tool. The most known is BLAST. Although, for the work I have BLAT is better, it gives me the chromosome as an output and i don’t need to write extra code to search the sequence ID on GenBank.
But what are the differences between those two alignment tools, except the S in their name.
First of all, the algorithms are structured differently. On DNA, BLAT works by keeping an index of an entire genome in memory. Thus, the target database of BLAT is not a set of GenBank sequences, but instead an index derived from the assempbly of the entire genome. By default, the index consists of all non-overlapping 11-mets except for those heavily involved in repeats, and it uses less that a gigabyte of RAM. This smaller size means that BLAT is far more easily mirrored than BLAST. BLAT of DNA is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. It may miss more divergent or shorter sequence alignments.
On proteins, BLAT uses 4-mers rather than 11-mers, finding protein sequences of 80% and greater similarity to the query of length 20+ amino acids. The protein index requires slightly more than 2 GB RAM. in practice due to sequence divergence rates over evolutionary time DNA BLAT works well within humans and primates, while protein BLAT continues to find good matches within terrestrial vertebrates and even earlier organisms for conserved proteins. Within humans, protein BLAT gives a much better picture of gene families (paralogs) than DNA BLAT. However, BLAST and psi-BLAST at NCBI can find much more remote matches.
From a practical standpoint, BLAT has several advantages over BLAST:
- speed (no queues, response in seconds) at the price of lesser homology depth
- the ability to submit a long list of simultaneous queries in FASTA format.
- five convenient output sort options
- a direct link into the UCSC browser
- alignment block details in natural genomic order
- an option to launch the alignment later as part of custom track
BLAT is commonly used to look up the location of a sequence in the genome or determine the exon structure of an mRNA, but expert users can run large batch jobs and make internal parameter sensitivity charges by installing command line BLAT on their own LINUX server.
1. BLAT indexes the genome/protein database returns the index in memory, and then scans the query sequence for matches. BLAST on the other hand, builds an index of the query sequences and searches through the database for matches, A BLAST variant called MegaBLAST indexes 4 databases to speed up alignments.
2. BLAT can extend on multiple perfect and near-perfect matches (default is 2 perfect matches of length 11 for nucleotide searches and 3 perfect matches of length 4 for protein searches, while BLAST extends only when one or two searches occur close together.
3. BLAT requires query sequences in FASTA format, while BLAST accepts both FASTA and queries by accession number.
4. BLAT connects each homologous area between two sequences into a single larger alignment, in contrast to BLAST which returns each homologous area as a separate local alignment. The result of BLAST is a list of exons with each alignment extending just past the end of the exon. BLAT, however, correctly places each base of the mRNA onto the genome, using each base only once and can be used to identify intron-exon boundaries (splice sites).
5. BLAT is less sensitive than BLAST.
BLAT is fast, but demands more exact matches. BLAST will allow lower scoring hits, and allows more gaps in alignments. You will get more hits with BLAST, but it may be slower.