HHMMiR is a de novo approach that is used for miRNA hairpin prediction in the absence of evolutionary conservation. This uses a Hierarchical Hidden Markov Model (HHMM) that utilizes region-based structural as well as sequence information of miRNA precursors.
MicroRNAs (miRNAs) are small non-coding single-stranded RNAs (20-23 nts) that are known to act as post-transcriptional and translational regulators of gene expression. Although they were initially overlooked, their role in many important biological processes, such as development, cell differentiation, and cancer has been established in recent times. In spite of their biological significance, the identification of miRNA genes in newly sequenced organisms is still based, to a large degree, on extensive use of evolutionary conservation, which is not always available.
- It first establishes a template for the structure of a typical miRNA hairpin by summarizing data from publicly available databases.
- Then it uses this template to develop the HHMM topology.
This algorithm can achieve average sensitivity of 84% and specificity of 88%, on 10-fold cross-validation of human miRNA precursor data. This model works well on hairpins from other vertebrates as well as invertebrate species. The success of this approach in such a diverse set of species indicates that sequence conservation is not necessary for miRNA prediction. This may lead to efficient prediction of miRNA genes in virtually any organism.
This algorithm has been used many times for classification and identification of miRNA hairpins. Probabilistic learning was previously applied for identifying the miRNA pattern/motif in hairpins.
- The advantage of the hierarchy used by our HHMMiR is that it parses each hairpin into four distinct regions and processes each of them separately. This shows the better biological role of each region, which is reflected in the distinct length distributions and neighborhood base-pairing characteristic of that region.
- Furthermore, the underlying HHMM provides an intuitive modelling of these regions.
The drawback of HHMMiR is that it depends on the mfe structure the RNAfold program returns. In the future, a new test of more folding algorithms will be provided or the same will use the probability distribution of a number of top scoring folding energy structures returned by algorithm.