Basic steps in construction of Phylogenetic Trees
Data selection – Amino acid or nucleotide
In the case of a gene phylogeny, we need to decide if we want to work with nucleotide or amino acid data. We can use either amino acid or nucleotide data to generate a tree.
Some argue that it is better to use amino acid data because the redundancy of the genetic code means we will be able to recover more conserved sites in our alignment. However, any analysis we perform with amino acid data is more time consuming in comparison to its nucleotide counterpart. This is because there are 20 possible amino acids substitutions, as opposed to only 4 nucleotide substitutions.
Other scientists prefer to use nucleotide data. As mentioned above, nucleotide analyses are faster. In addition, nucleotide data has more information that can be used to recognize the evolution of your sequence since 3 nucleotides code for 1 amino acid.
Alignment programs shift our data by inserting gaps to line up all the homologous (or conserved) sites into vertical columns. There are many alignment program, the most common and well-supported are,
It is best to try at least 2 different parameters, if not more, and then view our alignment to determine which is better
Models consist of many parameters that calculate the substitution rates of our data. In other words, a program predicts which model’s algorithm best captures the way our data set is evolving or changing. This model is used later to build our tree.
Maximum likelihood (ML) assumes the best tree is the tree that is most likely with the given data, under a certain model. ML will take into account all the data we have generated so far in order to construct our final tree. It is a commonly used tree-building algorithm that will give us a single tree as our output.
Making it pretty
When we have created our tree, then it’s time to make it publication ready.
If we need to change the taxa names, font, or size, use Adobe Illustrator or a similar image manipulation program. Make sure our taxa names can be clearly read and the bootstrap values are visible above each node.
Not all data will require such robust analysis. But we will not know for certain how much better or different a tree produced from a more robust analysis will be until this analysis is performed.
In general, the output tree of a phylogenetic analysis is an estimate of the character’s phylogeny (a gene tree) and not the phylogeny of the taxa (species tree) though ideally, both should be very close. They do not necessarily accurately represent the species evolutionary history the analysis can be confounded by horizontal gene transfer, hybridization between species, convergent evolution, and conserved sequences;
- Noncoding regions are more variable than coding regions
- Some positions in the protein coding genes are more variable then the others
- Some genes evolve faster than the other
Our company BioinfoLytics is affiliated with BioCode and is a project, where we are providing many topics on Genomics, Proteomics, their analysis using many tools in a better and advance way, Sequence Alignment & Analysis, Bioinformatics Scripting & Software Development, Phylogenetic and Phylogenomic Analysis, Functional Analysis, Biological Data Analysis & Visualization, Custom Analysis, Biological Database Analysis, Molecular Docking, Protein Structure Prediction and Molecular Dynamics etc. for the seekers of Biocode to further develop their interest to take part in these services to fulfill their requirements and obtain their desired results. We are providing such a platform where one can find opportunities to learn, research projects analysis and get help and huge knowledge based on molecular, computational and analytical biology.
We are providing “Phylogenetic Tree Construction and Analysis” service to our researchers and seekers to infer evolutionary relationships of species and to build a genome-wide phylogenetic tree for a large group of species containing a large number of genes with long nucleotides sequences.