MicroRNAs are small single stranded RNA molecules of nearly 22 nt in length which play an important role in post transcriptional gene regulation either by translational repression of mRNA or by their cleavage. Since their discovery, continuous efforts to identify the miRNA genes led to the discovery of certain miRNAs in plants as well as animals. Owing to the limitations of the molecular genetic techniques of miRNA identification, computational approaches were introduced for better and affordable in silico-miRNA predictions.
miPRED is a de novo Support Vector Machine (SVM) classifier for identifying pre-miRs without depending on phylogenetic conservation. To gain significantly higher sensitivity and specificity than existing de novo predictors, it employs a Gaussian Radial Basis Function kernel (RBF) as a similarity measure for 29 global and intrinsic hairpin folding attributes. They characterize a pre–miR at the dinucleotide sequence, hairpin folding, non-linear statistical thermodynamics and topological levels.
The original intent of miPRED is to distinguish pre-miRs spanning diverse species from genomic pseudo hairpins, according to the classifier model trained solely on human data sets. Since ncRNAs and mRNAs were not included in the initial training, it will be very instructive to assess how well miPRED can discriminate them as non-pre-miRs without relying on their specific nucleotide sequence, structural and topological characteristics.
- It has been trained on 200 human pre-miRs and 400 pseudo hairpins, miPRED achieves 93.50% (5-fold cross-validation accuracy) and 0.9833 (ROC score).
- Tested on the remaining 123 human pre-miRs and 246 pseudo hairpins, it reports 84.55% (sensitivity), 97.97% (specificity) and 93.50% (accuracy).
- Validated onto 1918 pre-miRs across 40 non-human species and 3836 pseudo hairpins, it yields 87.65% (92.08%), 97.75% (97.42%) and 94.38% (95.64%) for the mean (overall) sensitivity, specificity and accuracy.
To differentiate the real pre-miRNAs from other hairpin sequences with similar stem-loops (pseudo pre-miRNAs), a hybrid feature which consists of local contiguous structure-sequence composition, minimum of free energy (MFE) of the secondary structure and P-value of randomization test is used. Besides, a novel machine-learning algorithm, random forest (RF), is introduced. Given a sequence, MiPred decides whether it is a pre-miRNA-like hairpin sequence or not. If the sequence is a pre-miRNA-like hairpin, the RF classifier will predict whether it is a real pre-miRNA or a pseudo one.
Following tools are used to evaluate the results of miPRED and to evaluate the performance of its predictability based on the real and pseudo precursor miRNA datasets;
- SVM Bayes MiRNA find
- One Class miRNA find
- Bayes SVM miRNA find
MiPred is more sensitive in identifying pseudo miRNAs than Triplet-SVM for real/pseudo miRNA classification, whereas for mature miRNA prediction ‘one-class’ SVM classifier shows best specificity, while Bayes SVM miRNA find shows least specificity.