Hypothetical proteins are created by gene prediction software during genome analysis of species. When the Bioinformatics tool used for the gene identification finds a large open reading frame without a characterized homologue in the protein database, it returns “hypothetical protein” as an annotation remark.
In-silico methods to annotate the hypothetical proteins are cost effective and fast enough to explore their function. Multiple algorithm based software’s have been used for the prediction of hypothetical protein function that may lead to the identification of novel pharmacological targets for screening, drug discovery and designing for the treatment of HAdV infections.
Analysis of physiochemical properties of all HPs can be done by online server ExPASy’s Protparam tool. This server executes theoretical evaluation of physiochemical properties like isoelectric point, molecular weight, aliphatic index, grand average of hydropathicity (GRAVY) and instability index.
Many recent Bioinformatics tools, such as the Conserved Domain Architecture Retrieval Tool (CDART), the Simple Modular Architecture Research Tool (SMART), CATH, Pfam, SUPERFAMILY and SVMProt, have been developed to assign functions to HPs from many species. These tools are associated with all the data available in many databases using domain, family and ontology information to support protein function characterizations. In addition, the study of PPI using software for protein interaction searches, such as the STRING database, is necessary for understanding the role of a protein in a biological network. These interactions play an important role in cellular processes, and by studying them, an understanding of HP function and inferences about biological functions for these non-elucidated proteins can be reached.
HPs are of great importance, as many of them might be associated with human diseases, thus falling into functional families. In spite of their lack of functional characterization, they play an important role in understanding biochemical and physiological pathways; for example, in finding new structures and functions, markers and pharmacological targets and early detection and benefits for proteomic and genomic research. In the recent past, many efficient approaches have existed and the tools are publicly available to predict the function of the HPs.
High-throughput experimental methods like the yeast two-hybrid (Y2H) method and mass spectrometry are available to discern the function of these proteins, the datasets generated by these methods tend to be incomplete and generate false positives. Along with PPIs, there are other methods to identify the essentiality of proteins, such as antisense RNA, RNA interference, single-gene deletions and transposon mutagenesis.
Furthermore, tools such as “LOCALIZER”, that predicts subcellular localization of both plant and effector proteins in the plant cell, and IncLocator have been useful in predicting subcellular localization for long non-coding RNAs based on stacked ensemble classifiers. On the other hand, combined analysis of all these methods or datasets is considered to be more predictive in integrating heterogeneous biological datasets. Genome-wide expression analysis, machine learning, data mining, deep learning and Markov random fields are the other prediction methods which are widely employed, whereas Support Vector Machines (SVM), Neural Networks, Bayesian Networks, Probabilistic Decision Trees, Rosetta Stone, Gene Clustering and Network Neighborhood analyses have been used to combine different biological data sources to interpret biological relationships. These have shown to be successful in predicting protein function, annotation based on feature selection for inferring the function of HPs is wanting.