In bioinformatics a dot plot is a graphical method that allows the comparison of two biological sequences and identifies regions of close similarity between them.
A dot plot is a simple, yet intuitive way of comparing two sequences, either DNA or protein, and is probably the oldest way of comparing two sequences.
Dot plot are two dimensional graphs, showing a comparison of two sequences. The principle used to produce the dot plot is: The top X and the left y axes of a rectangular array are used to represent the two sequences to be compared.
• Columns = residues of sequence 1
• Rows = residues of sequence 2.
A dot is plotted at every coordinate where there is similarity between the bases.
Any region of similarity is revealed by a diagonal row of dots. Isolated dots not on diagonal show random matches. Detection of matching regions can be improved by filtering out random matches and this can be achieved by using a sliding window. It means that instead of comparing a single sequence position more positions are compared at the same time and dot is printed only if a certain minimal number of matches occur.
A dot matrix picture provides a global picture of local similarities between two sequences. They are appropriate:
- for comparing large sequences (several 1000 residues)
- If one does not know in advance whether two sequences share detectable similarity or which parts of the sequences are related to each other.
- detection of repeats within protein sequences
- detection of shared domains between protein sequences
When the two sequences have substantial regions of similarity, many dots line up to form contiguous diagonal lines, which reveal the sequence alignment. If there are interruptions in the middle of a diagonal line, they indicate insertions or deletions. Parallel diagonal lines within the matrix represent repetitive regions of the sequences. A problem exists when comparing large sequences using the dot matrix method, namely, the high noise level. In most dot plots, dots are plotted all over the graph, obscuring identiﬁcation of the true alignment. For DNA sequences, the problem is particularly acute because there are only four possible characters in DNA and each residue therefore has a one-in-four chance of matching a residue in another sequence. To reduce noise, instead of using a single residue to scan for similarity, a ﬁltering technique has to be applied, which uses a “window” of ﬁxed length covering a stretch of residue pairs. This method has been shown to be effective in reducing the noise level. The window is also called a tuple, the size of which can be manipulated so that a clear pattern of sequence match can be plotted. However, if the selected window size is too long, sensitivity of the alignment is lost. There are many variations of using the dot plot method. For example, a sequence can be aligned with itself to identify internal repeat elements. In the self-comparison, there is a main diagonal for perfect matching of each residue. If repeats are present, short parallels in these are observed above and below the main diagonal. Self -complementarity of DNA sequences (also called inverted repeats), for example, those that form the stems of a hairpin structure, can also be identiﬁed using a dot plot. In this case, a DNA sequence is compared with its reverse-complement sequence. Parallel diagonals represent the inverted repeats.