Protein–RNA interactions, key in biological processes, remained refractory to prediction algorithms. RNA–protein complexes in the Protein Data Bank were decomposed into small peptide–oligonucleotide interacting fragment pairs and used as building blocks to assemble big scaffolds representing complex RNA–protein interactions. This method has already been successful for designing DNA–protein and protein–protein interfaces. Areas under the curve up to 0.86 were achieved on binding site prediction showing the accuracy and coverage of our approach over established and in-house benchmarking sets.
RNA–protein interactions are crucial for such key biological processes as regulation of transcription, splicing, translation, and gene silencing, among many others. Knowing where an RNA molecule interacts with a target protein and engineering an RNA molecule to specifically bind to a protein could allow for rational interference with these cellular processes and the design of novel therapies.
A significant part of biology involves the formation of RNA–protein complexes. X-ray crystallography has added a few solved RNA–protein complexes to the repertoire; however, it remains challenging to capture these complexes and often only the unbound structures are available. This has inspired a growing interest in finding ways to predict these RNA–protein complexes.
Protein-RNA interactions play an important role in many biological processes. Computational methods such as docking have been developed to complement existing biophysical and structural biology techniques. Computational prediction of protein-RNA complex structures includes two steps;
· Generating candidate structures from the individual protein and RNA
· Scoring the generated poses to pick out the correct one
Direct readout methods are based mainly on machine learning algorithms and include;
On the other hand, NAPS, RNABindR, PRBR, and catRapid use decision trees, the naïve Bayes classifier, and random forest algorithms, respectively, incorporating the physicochemical properties of amino acids along with predictable features, such as secondary structure, sequence conservation, and solvent accessibility. In contrast, indirect readout approaches were initially adapted from existing protein–protein docking algorithms. However, these programs often fail to generate native-like structures due to incomplete sampling of the conformational space and deficiencies of the scoring functions, which are not specifically designed for RPIs. Nonetheless, docking algorithms, such as HADDOCK, RosettaDock server, ClusPro, GRAMM-X, 3D-Garden, HEX server, SwarmDock, ZDOCK server, PatchDock, ATTRACT, pyDockSAXS, InterEvDock, NPDock, and HDOCK, with the exception of NPDock, have been adapted to accept nucleic acids. Rigid solid approaches face problems derived from the intrinsic flexibility of the RNA and the electrostatic nature of forces driving the RPIs. To account for these properties, some of the methods taken from protein-protein docking have been updated by adding electrostatic terms to the energy function.