There have been two members of the BLAST suite of programs that are designed to make nucleotide-to-nucleotide alignments. The first is the original BLAST nucleotide search program known as “BLASTn.” The “BLASTn” program is basically a general purpose nucleotide search and alignment program that is sensitive and can be used to align tRNA or rRNA sequences as well as mRNA or genomic DNA sequences having a mix of coding and noncoding regions. A more recently developed nucleotide-level BLAST program called MegaBLAST is about 10 times faster than “BLASTn” but is designed to align sequences that are nearly identical, differing by only a few percent from one another.
MegaBLAST allows the rapid mapping of a transcript onto a typical 3 billion base mammalian genome in seconds, and is useful for processing large batches of sequences. A refinement of MegaBLAST, known as discontiguous MegaBLAST, uses a discontiguous template to define an initial “word” in which characters in some positions, such as those in the wobble base position of codons, need not match. Discontiguous MegaBLAST allows rapid cross-species mappings involving coding regions in cases where species differences in codon usage would prevent alignments using the original MegaBLAST program.
Genome BLAST services are available at NCBI for a variety of organisms including human, mouse, rat, fruit fly, and many others in a growing list. At a minimum, MegaBLAST and “BLASTn” searches against the complete genome are supported. These are usually offered in conjunction with “BLASTn” searches against the genome, “BLASTp” and “BLASTx” searches against the proteins annotated on the genome and MegaBLAST, “BLASTn” and “BLASTn” searches against collections of transcript sequences that have been mapped to the genome. Hits to the genome are shown graphically within NCBI’s MapViewer to show their genomic context.
MegaBLAST uses a greedy algorithm for the nucleotide sequence alignment search. This program is optimized for aligning sequences that differ slightly as a result of sequencing or other similar “errors”. When larger word size is used, it is up to 10 times faster than more common sequence similarity programs. MegaBLAST is also able to efficiently handle much longer DNA sequences than the BLASTn program of traditional BLAST algorithms.
MegaBLAST is most efficient with word sizes 16 and larger, although word size as low as 8 can be used. If the value W of the word size is divisible by 4, it guarantees that all perfect matches of length W + 3 will be found and extended by MegaBLAST, however perfect matches of length as low as W might also be found, although the latter is not guaranteed. Any value of W not divisible by 4 is equivalent to the nearest value divisible by 4 (with 4i+2 equivalent to 4i).
By default, non-affine gapping parameters are assumed in MegaBLAST. This means that the gap opening penalty is 0, and gap extension penalty E can be computed from match reward r and mismatch penalty q by the formula:
E = r/2 – q
The non-affine version of MegaBLAST requires significantly less memory and is also significantly faster, however affine gapping parameters can also be used, preferably with larger word sizes. Non affine gapping parameters tend to yield alignments with more gaps, but the gap lengths are shorter. X-dropoff value As in BLAST, this value provides a cutoff threshold for the extension algorithm tree exploration. When the score of a given branch drops below the current best score minus the X-drop off, the exploration of this branch stops. However the actual values of the X-dropoff for MegaBLAST and for traditional nucleotide BLAST algorithms are not necessarily compatible, i.e. with the same word size, match, mismatch and gapping penalties and with the same X-dropoff, the two algorithms might produce different results, which can be remedied by changing the X-dropoff value for one of the algorithms.
For user convenience, the MegaBlast page supports both MegaBlast and regular BLASTn search. MegaBlast currently does not support the graphical overview of the alignments, so this option is automatically turned off when the MegaBlast checkbox is checked.
MegaBlast takes as input a set of FASTA formatted DNA query sequences. These can be either pasted into a provided text area, or downloaded from a file. It is preferable to submit many query sequences at a time, but not more than 16383.