Sequence alignment is a method of arranging protein or DNA sequences to identify regions of similarity that may be a result of functional, structural, evolutionary relationships between the sequences.
Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data.
Global alignments is the way to align every residue in every sequence, and is most useful when the sequences in the query set are similar and of roughly equal size. A general global alignment technique is the Needleman–Wunsch algorithm, which is based on dynamic programming. Local alignments are more useful for dissimilar sequences that are suspected to have regions of similarity or similar sequence motifs within their larger sequence context. The Smith–Waterman algorithm is a general local alignment method based on the same dynamic programming scheme.
Interpretation of sequence alignments
- Analysis of disease encoding parts of gene
- Sequences with high similarity have common 3D structures, similar functions and likely of having a common ancestor
- Revealed horizontal transfer of genes among species
Pairwise alignments include the following methods;
- Dynamic programming method
- Word method
Multiple sequence alignment methods
- Dynamic programming
- Progressive methods
- Iterative methods
Some sequence alignment tools are;
- Clustal Omega
Structural alignments are usually specific to protein and sometimes RNA sequences and provide information about the secondary and tertiary structure of the protein or RNA molecule to help in aligning the sequences. Because both protein and RNA structure is more evolutionarily conserved than sequence, structural alignments can be more reliable between sequences that are very distantly related and that have diverged so extensively that sequence comparison cannot reliably detect their similarity.
Structural alignments are used as the “gold standard” in evaluating alignments for homology-based protein structure prediction because they explicitly align regions of the protein sequence that are structurally similar rather than relying exclusively on sequence information. However, clearly structural alignments cannot be used in structure prediction because at least one sequence in the query set is the target to be modeled, for which the structure is not known. It has been shown that, given the structural alignment between a target and a template sequence, highly accurate models of the target protein sequence can be produced; a major stumbling block in homology-based structure prediction is the production of structurally accurate alignments given only sequence information.
Algorithms used for structure alignment are
- Homology Modelling
- Protein threading
Structural alignment is a way of sequence alignment based on comparison of shape. These alignments aim to establish equivalences between two or more structures of polymers based on their shape and three-dimensional conformation.
These alignments attempt to establish equivalences between two or more polymer structures based on their shape and three-dimensional conformation. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires a prior knowledge of equivalent positions. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence.