BLAT (BLAST-like alignment tool) is a pairwise sequence alignment algorithm that was developed by Jim Kent at the University of California Santa Cruz (UCSC) in the early 2000s to help in the assembly and annotation of the human genome. It was designed primarily to decrease the time needed to align millions of mouse genomic reads and expressed sequence tags against the human genome sequence. The alignment tools of the time were not capable of performing these operations in a manner that would allow a regular update of the human genome assembly. Compared to pre-existing tools, BLAT was ~500 times faster with performing mRNA/DNA alignments and ~50 times faster with protein/protein alignments.
BLAT is one of multiple algorithms developed for the analysis and comparison of biological sequences such as DNA, RNA and proteins, with a primary goal of inferring homology in order to discover biological function of genomic sequences. Its first attempt is to rapidly detect short sequences which are more likely to be homologous, and then it aligns and further extends the homologous regions. It is similar to the heuristic BLAST family of algorithms.
- Alignment of multiple mRNA sequences onto a genome assembly
- Alignment of a protein or mRNA sequence from one species onto a sequence database from another species to determine homology
- BLAT can be used for alignments of two protein sequences
- Determination of the distribution of exonic and intronic regions of a gene
- Detection of gene family members of a specific gene query
- Display of the protein-coding sequence of a specific gene
BLAT is used to find regions in a target genomic database which are similar to a query sequence under examination. The general algorithmic process followed by BLAT is similar to BLAST’s in that it first searches for short segments in the database and query sequences which have a certain number of matching elements. These alignment seeds are then extended in both directions of the sequences in order to form high-scoring pairs. However, BLAT uses a different indexing approach from BLAST, which allows it to rapidly scan very large genomic and protein databases for similarities to a query sequence. BLAT builds a list of all overlapping k-mers from the query sequence and searches for these in the target database, building up a list of hits where there are matches between the sequences.
BLAST/BLAT are available at many sites such as;
- UCSC Genome Bioinformatics Site
- At genie.org
A BLAT search returns a list of results that are ordered in decreasing order based on the score. The score of the alignment, the region of query sequence that matches to the database sequence, the size of the query sequence, the level of identity as a percentage of the alignment and the chromosome and position that the query sequence maps to are shown in the results page.
The user is able to obtain biological information associated with the alignment, such as information about the gene to which the query may match. The user is also provided with a link to view the alignment of the query sequence with the genome assembly. The matches between the query and genome assembly are blue and the boundaries of the alignments are lighter in colour. These exon boundaries indicate splice sites.
Alignment search criterias followed by the BLAST/BLAT are;
- Searching With Single Perfect Matches
- Searching With Single Almost Perfect Matches
- Searching With Multiple Perfect Matches
- Clumping Hits and Identifying Homologous Regions
- Searching for Near Perfect Matches