When we are looking for nucleotide sequences not coding for proteins, BLASTn tool is used. Blastn is a poor tool for finding protein-coding sequences. This is in part due to the main and rock position of the third nucleotide in most codons. Most amino acids can be encoded by multiple codons differing in the third position. Thus the exact same amino acid sequence can be encoded by two nucleotide sequences differing in every third position (since mutations in the third position do not affect the resulting protein, such mutations typically gather quite rapidly). The amino acid sequences being identical, BLASTp would have no problem in retrieving one sequence, using the other sequence as query. Blastn, however, uses a default word size of 11 nucleotides. This means the two sequences must match with at least 11 nucleotides for BLASTn to be able to report any hit at all. When we set the word size to 6, the best hit had an E-value of 0.031. In this case, a perfect match of 6 nucleotides will be found between the query and database sequences, but BLASTn was not able to extend this alignment very much, explaining the bad E-value (sometimes, this would not be considered a significant hit).
When we are looking for nucleotide sequences not coding for proteins, BLASTn tool is still a poor tool if it will be used. The reason here is similar to codon-usage in protein-coding sequences: often, such non-coding sequences result in functional RNA molecules rather than proteins. These RNA molecules carry a specific secondary structure, held together by base-pairing. Compensating mutations may change the RNA sequence without changing the RNA secondary structure. For example, an A-U base-pairing may change to a G-C base-pairing, retaining the structure but changing the sequence. The resulting RNA may keep its functionality which is determined by this structure, but the underlying sequence may change enough to become unrecognizable by the BLASTn algorithm. If looking for non-coding RNA sequences, specialized tools may be used instead of BLASTn. For example, tRNAScan-SE is a specialized tool made to recognize tRNA sequences, Infernal (and the Rfam database) is a general RNA-finding algorithm. If it is necessary to use BLASTn, make sure to experiment with different word sizes before finding that a sequence is not present in a database.
There have been two members of the BLAST suite of programs that are designed to make nucleotide-to-nucleotide alignments. The first is the original BLAST nucleotide search program known as “BLASTn.” The “BLASTn” program is basically a general purpose nucleotide search and alignment program that is sensitive and can be used to align tRNA or rRNA sequences as well as mRNA or genomic DNA sequences having a mix of coding and noncoding regions. A more recently developed nucleotide-level BLAST program called MegaBLAST is about 10 times faster than “BLASTn” but is designed to align sequences that are nearly identical, differing by only a few percent from one another.
- Mapping oligonucleotides
- PCR products to a genome
- Screening repetitive elements
- Cross-species sequence exploration
- Annotating genomic DNA
- Clustering sequencing reads
- Vector clipping