In biochemistry, a hypothetical protein is a protein whose existence has been predicted, but there is no experimental evidence for it that it is expressed in vivo. Sequencing of certain genomes has resulted in many predicted open reading frames to which functions cannot be readily assigned. They are made up almost 20% to 40% of proteins encoded in each newly sequenced genome. Even when there is enough evidence that the product of the gene is expressed, by techniques such as microarray and mass-spectrometry, it is difficult to give a function to it given its lack of identity to protein sequences with annotated biochemical function. Nowadays, most protein sequences are resulted from computational analysis of genomic DNA sequences.
Hypothetical proteins are usually produced by gene prediction software during genome analysis. When the Bioinformatics tool used for the gene identification finds a large open reading frame without a characterized homologue in the protein database, it returns “hypothetical protein” as an annotation remark.
The function of a hypothetical protein can be predicted by
- Domain homology searches with various confidence levels. Conserved domains are present in the hypothetical proteins which needed to be compared with the known family domains by which hypothetical proteins could be classified into particular protein families even though they have not been in vivo investigated.
- Homology modeling, in which hypothetical protein has to align with a known protein sequence who’s three dimensional structure is known and by modeling method if structure predicted then the capability of hypothetical protein to function could be ascertained computationally.
Further, approaches to annotate function to hypothetical proteins include determination of 3-dimensional structure of these proteins by structural genomics initiatives, understanding the nature and mode of prosthetic group/metal ion binding, fold similarity with other proteins of known functions and annotating possible catalytic site and regulatory site.
Uses of HPs
- These proteins are largely involved in different cellular and signaling pathways. Structural and functional characterization of HPs reveals crucial roles in microorganisms, especially in pathogens related to human diseases.
- These play major roles in different vital phenomena for life including host adaptation, wound healing and chemotaxis.
- In the current era of drug and antibiotic resistance, HPs can be novel targets to treat related diseases. Identification and characterization of most HPs are under observation and will be the most promising genomic and Bioinformatics techniques in structure-based drug designing and vaccine production in future.
HPs are uncharacterized gene or gene products and have no significant homology or similarity with any characterized genes or gene products. These genes are predicted by sequencing programs for example, GLIMMER is a sequencing program that finds >97% of genes annotated in literature, but a significant portion of genome have genes that are predicted by software with no homology available in the online databases e.g. NCBI with BLAST analysis. Due to no homology, these genes are considered unique and may catalyze unique functions.
Structural analysis of a protein is important for studying its different parameters. For example:
- conformational changes
- rotation of bond angles along the axis
- binding to its target
- protein activity
Like biochemical characterization, structural and functional prediction of proteins also includes experimental and in-silico approaches. Two known experimental approaches are used in the structural determination of HPs that are X-ray crystallography and NMR.