MAFFT (Multiple Alignment using Fast Fourier Transform) is a high speed multiple sequence alignment program for Unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.
In bioinformatics, MAFFT (for multiple alignment using fast Fourier transform) is a program used to create multiple sequence alignments of amino acid or nucleotide sequences. Published in 2002, the first version of MAFFT used an algorithm based on progressive alignment, in which the sequences were clustered with the help of the Fast Fourier Transform. Subsequent versions of MAFFT have added other algorithms and modes of operation, including options for faster alignment of large numbers of sequences, higher accuracy alignments, alignment of non-coding RNA sequences and the addition of new sequences to existing alignments.
The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques.
- Homologous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue.
- We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT.
The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.
MAFFT MSA interface is cited at EBI
- Click on the ‘more options’ button to set the alignment options.
- Change the output format ‘clustalw’ (Default value is: Pearson/FASTA [fasta]).
- Matrix Protein comparison matrix to be used when adding sequences to the alignment. Matrix (Protein Only) Default value is: BLOSUM 62
- Gap Open Penalty for first base/residue in a gap. Default value is: 1.53
- Gap Extension Penalty for each additional base/residue in a gap. Default value is: 0.123
- The order in which the sequences appear in the final alignment Default value is: aligned Tree Rebuilding Number Default value is: 1 Guide Tree Output Generate guide tree file Default value is: ON [true]
- Max Iterate Maximum number of iterations to perform when refining the alignment. Change the Max Iterate value to ‘2’ to change the number of iterations for better alignment. Default value is: 0
- Perform FFTS (Fast Fourier Transform) Default value is: local pair
- Click ‘submit’
The N terminal alignment in the result generated by the MAFFT is similar to the one generated by the T-Coffee. The alignment in the middle and C terminal is entirely different from the T-Coffee or MUSCLE. Though alignments differ, conservation of amino acids at the active site are retained.