DbClustal is a protein multiple alignment program that aligns top-scoring full-length sequences from a protein BLAST similarity search. Its alignment algorithm is based on the global alignment program ClustalW and is modified to incorporate locals.
DbClustal is capable of producing high quality global alignments of the top scoring sequences found in a Blast database search, automatically and within the time limits essential to large scale genome analysis projects. DbClustal combines the advantages of both local and global alignment algorithms in a traditional tree-based progressive alignment. The widely used global alignment program ClustalW has been modified to incorporate local alignment data in the form of a list of anchor points between pairs of sequences in the dataset. A new tool, the BLAST post-processing program, Ballast is used to create the anchor points, although other sources of local conservation information could be used. Ballast identifies conserved parts in the sequences detected by BLAST and creates a file containing a list of self-consistent anchors between the query sequence and the database hits. By weighting the DbClustal global alignment towards the anchor points, an accurate multiple alignments can be generated incorporating very long gaps for the terminal extensions and internal insertions. The weighting scheme used in DbClustal means that the global alignment is encouraged towards, but not constrained to, the conserved motifs.
DbClustal addresses the important problem of the automatic multiple alignment of the top scoring full-length sequences detected by a database homology search.
The rapidity and reliability of DbClustal have been shown using the recently annotated proteome of certain plants where the number of alignments with totally misaligned sequences was reduced from 20% to <2%. A website is implemented proposing BLASTp database searches with automatic alignment of the top hits by DbClustal.
Evaluation calculation for the DbClustal is performed using the three procedures: similarity calculation, distance calculation, and cluster validation calculation.
The percent identify (ID) score is used to calculate the similarity between two sequences in the aligned matrices generated in the step named ‘alignment analyses’.
PipeAlign2 is a protein family analysis tool integrating a multi-step process ranging from the search for sequence homologues in protein and 3D structure databases to the structural / functional annotation of the family. The complete, automatic pipeline takes a single sequence or a set of sequences as input and constructs a high-quality, validated multiple sequence alignment (MSA) in which sequences are clustered into potential functional subgroups. Sequence annotations are automatically extracted from the public database (Uniprot, InterPro, etc.), cross-validated and integrated in the MSA. For the more experienced user, the PipeAlign2 server also provides many options to run only a part of the analysis, with the possibility to modify the default parameters of each step.