Multiple alignments of protein sequences are important in many applications, including phylogenetic tree estimation, secondary structure prediction and critical residue identification. Many multiple sequence alignment (MSA) algorithms and tools have been proposed.
MUltiple Sequence Comparison by Log-Expectation (MUSCLE) is computer software for multiple sequence alignment of protein and nucleotide sequences. It is licensed as public domain. MUSCLE alignment is also used in MEGA6 tool which is used for phylogeny tree construction.
Prominent features are rapid sequence distance computation using k-mer counting, a profile function computing a log-expectation scores, and tree-dependent partitioning of the sequences. MUSCLE is one of the best-performing multiple alignment programs according to published benchmark tests, with accuracy and speed that are consistently better than ClustalW or T-Coffee. MUSCLE can align hundreds of sequences in seconds. Most users learn everything they need to know about MUSCLE in a few minutes, only a handful of command-line options are needed to perform common alignment tasks.
The MUSCLE algorithm proceeds in three stages. At the completion of each stage, a multiple alignment is available and the algorithm can be terminated.
The draft progressive
In the draft progressive stage, the algorithm produces a draft multiple alignment, highlighting speed over accuracy. The similarity of each pair of sequences is computed, either using k-mer counting or by constructing a global alignment of the pair and determining the fractional identity. A triangular distance matrix is computed from the pairwise similarities. A tree is constructed from the distance matrix using UPGMA or neighbor-joining, and a root is identified. A progressive alignment is built by following the branching order of the tree, producing a multiple alignment of all input sequences at the root.
In the improved progressive stage, the Kimura distance is used to re-estimate the binary tree to create the draft alignment, in turn producing a more accurate multiple alignment. The similarity of each pair of sequences is computed using fractional identity computed from their mutual alignment in the current multiple alignment. A tree is constructed by computing a Kimura distance matrix and applying a clustering method to this matrix. The previous and new trees are compared, identifying the set of internal nodes for which the branching order has changed. A new progressive alignment is built. The existing alignment is retained on each subtree for which the branching order is unchanged.
The final refinement stage refines the improved alignment made in step two. The refinement stage adds to the time complexity. An edge is deleted from the tree, dividing the sequences into two disjoint subsets (a bipartition). Edges are visiting in order of decreasing distance from the root. The profile (multiple alignment) of each subset is extracted from the current multiple alignment. Columns containing no residues are discarded. The two profiles obtained in the previous step are re-aligned to each other using profile-profile alignment.
Tools used for the assessment of MSA results are;