Bioinformatics

Secondary Structure Prediction Methods for RNA

Pinterest LinkedIn Tumblr

Ab-initio approach makes structural predictions based on a single RNA sequence. The logic behind this method is that the structure of an RNA molecule is only determined by its sequence. Thus, algorithms can be designed to search for a stable RNA structure with the lowest free energy. Generally, when a base pairing is formed, the energy of the molecule is lowered because of attractive interactions between the two strands. Thus, to search for a most stable structure, Ab-initio programs are designed to search for a structure with the maximum number of base pairs. Free energy can be calculated based on parameters empirically derived for small molecules. G–C base pairs are more stable than A–U base pairs, which are more stable than G–U base pairs. It is also known that base-pair formation is not an independent event. The energy necessary to form individual base pairs is influenced by adjacent base pairs through helical stacking forces. This is called cooperativity in helix formation. If a base pair is next to other base pairs, the base pairs tend to stabilize each other through attractive stacking interactions between aromatic rings of the base pairs. The attractive interactions lead to even lower energy. Parameters for calculating the cooperativity of the base-pair formation have been determined and can be used for structure prediction. 

In searching for the lowest energy form, all possible base-pair patterns have to be examined. There are several methods for finding all the possible base-paired regions from a given nucleic acid sequence in which Dot matrices are well known. The dot matrix method and the dynamic programming method can be used in detecting self-complementary regions of a sequence. A simple dot matrix can find all possible base-pairing patterns of an RNA sequence when one sequence is compared with itself. In this case, dots are placed in the matrix to represent matching complementary bases instead of identical ones. The diagonals perpendicular to the main diagonal represent regions that can self-hybridize to form double-stranded structure with traditional A–U and G–C base pairs. Normally, only a window size of four consecutive base matches is used. If the dot plot shows more than one feasible structure then the lowest energy one is chosen. 

However, if a large molecule has multiple secondary structure segments, choosing a combination that is energetically most stable among a large number of possibilities can be a daunting task. To cover the problem, a quantitative approach such as dynamic programming can be used to attain a final structure with optimal base-paired regions. In this approach, an RNA sequence is compared with itself. A scoring scheme is applied to fill the matrix with match scores based on Watson–Crick base complementarity. Often, G–U base pairing and energy terms of the base pairing are also incorporated into the scoring process. A path with the maximal score within a scoring matrix after taking into account the entire sequence information represents the most probable secondary structure form. The dynamic programming method produces one structure with a single best score. However, this is potentially a disadvantage of this approach because in reality the RNA may exist in multiple alternative forms with near minimum energy but not necessarily the one with maximum base pairs. 

The problem of dynamic programming to select one single structure can be complemented by adding a probability distribution function, known as the partition function, which calculates a mathematical distribution of probable base pairs in a thermodynamic equilibrium. This function helps to select a number of suboptimal structures within a certain energy range. 

Write A Comment