Determining and knowing the RNA secondary structure from sequence data by computational predictions is a long-standing problem. Its solution has been approached in the following two distinctive ways.
- If a multiple sequence alignment of a collection of homologous sequences is available, the comparative method uses phylogeny to determine conserved base pairs that are more likely to form as a result of billions of years of evolution than by chance.
- In the case of single sequences, recursive algorithms that compute free energy structures by using empirically derived energy parameters have been developed.
The latter approach of RNA folding prediction by energy minimization is widely used to predict RNA secondary structure from sequence. For a significant number of RNA molecules, the secondary structure of the RNA molecule is indicative of its function and its computational prediction by minimizing its free energy is important for its functional analysis. A general method for free energy minimization to predict RNA secondary structures is dynamic programming, although other optimization methods have been developed as well along with empirically derived energy parameters.
Current RNA secondary-structure prediction methods can be classified into comparative sequence analysis and folding algorithms with thermodynamic, statistical, or probabilistic scoring schemes.
- Comparative sequence analysis determines base pairs conserved among homologous sequences. These methods are highly accurate if a large number of homologous sequences are available and those sequences are manually aligned with expert knowledge.
- In the folding algorithms approach, RNA structure is divided into substructures such as loops and stems according to the nearest-neighbor model. Dynamic programming algorithms are then employed for locating the global minimum or probabilistic structures from these substructures. The scoring parameters of each substructure can be obtained experimentally or by machine learning.
RNAfold is one of the core programs of the Vienna RNA package. It can be used to predict the minimum free energy (MFE) secondary structure of single sequences using the dynamic programming algorithm. In addition to MFE folding, equilibrium base-pairing probabilities are calculated via partition function (PF) algorithm.
The input of RNAfold is a single RNA or DNA sequence in plain text or FASTA format, can be pasted into the text box or uploaded as a file. Additionally, one can enter structure constraints in a separate text box. By default, both the MFE and PF algorithm will be computed.
The RNAfold server output contains the predicted MFE secondary structure in the usual dot-bracket notation, additionally mfold-style Connect (ct) files can be downloaded. The secondary structure together with the sequence can be passed on to the RNAeval web server, which gives a detailed thermodynamic description according to the loop-based energy model.
Since RNA structure prediction is error-prone, it is important to augment predicted structures with reliability information. Different kinds of reliability annotation can be derived from the results of partition function folding.