NOVOMIR is a program that is used for the identification of miRNA genes in plant genomes. It uses a series of filter steps and a statistical model to discriminate a pre-miRNA from other RNAs and does depend neither on prior knowledge of a miRNA target nor on comparative genomics.
MicroRNAs (miRNAs) are genome-encoded single-stranded RNA molecules of 22 nt in length, which play a significant role in regulation of gene expression in eukaryotes. Many details on biogenesis and interactions of miRNAs are known. In detail, miRNAs can be encoded by miRNA genes, but also be generated from different RNA transcripts (e.g., from introns of protein-coding genes).
Plant and animal miRNAs differ to some extent with respect to biogenesis and structural characteristics but also in their mode of action.
- In plants, most if not all miRNAs are transcribed from genes by RNA-dependent RNA polymerase II (polII) into primary transcripts called pri-miRNA; these transcripts fold into stem-loop structures.
- In the cytoplasm, the miRNA is incorporated into the RNA-induced silencing complex (RISC), and base-pairing of the miRNA with complementary messenger RNA (mRNA) regions leads to mRNA degradation or to inhibition of mRNA translation.
Most plant miRNAs base-pair with their respective target mRNAs in the coding region with perfect or near-perfect complementarity leading to cleavage (and degradation) of the mRNAs; animal miRNAs usually base-pair with untranslated regions through imperfect complementarity leading to translation repression.
Finding of miRNA genes either needs costly experimental approaches for example, genetics, which led to the detection of the first animal miRNAs cloning and sequencing of cDNA, or deep sequencing or computational prediction methods, which facilitate subsequent experimental verification or falsification.
The different properties of miRNAs in plants and animals gave rise to different computational approaches. Most of these tools, however, rely on the following features;
- The miRNA resides in a stem-loop structure, which possesses a high thermodynamic stability and does not contain large internal loops or asymmetric bulges at least in the region of the mature miRNA.
- In addition, many tools take into account a phylogenetic conservation of the pre-miRNA structure and miRNA sequence, which limits the chance to detect non-conserved, evolutionary new miRNA genes.
The useful program NOVPOMIR uses a set of heuristic filters and a statistical model to discriminate a miRNA precursor from all other RNAs. The data for this model are collected based on a set of true positive sequences and a set of true non-miRNA sequences. All thresholds for the filter steps and the probabilities for the Hidden-Markov model are selected on the basis of “receiver operating characteristic” (ROC) curves.