UniProtKB, is also named as Swiss-Prot, is the manually quality annotated and reviewed section of the UniProt. It is a non-redundant protein sequence database which gives the researchers a combination of experimental results, computed features and scientific results. Since 2002, the UniProt consortium has maintained it and it can be accessible via the website of UniProt.
Importance of UniProt Knowledgebase
It is the central hub for the collection of functional information on proteins, with accurate, consistent and annotation. In addition to this, it shows the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much protein annotation information as possible is added. This includes extensively accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data.
The UniProt Knowledgebase consists of two sections:
A section which contains manually-annotated records with information extracted from literature and curator-evaluated computational analysis.
A section with computationally analyzed records that expect full manual annotation.
More than 95 % of the protein sequences present in UniProtKB are derived from the translation of the coding sequences (CDS) which have been submitted to the public nucleic acid databases, the EMBL-Bank, GenBank, DDBJ databases. All these sequences, as well as the related data submitted by the authors, are automatically integrated into UniProtKB/TrEMBL.
Minimal redundancy is the first trying of UniProtKB to improve sequence reliability. All protein sequences encoded by the same gene are merged into a single UniProtKB/Swiss-Prot entry. Differences found between various sequencing reports are analysed and fully described in the feature table. Once in UniProtKB/Swiss-Prot, a protein entry is removed from UniProtKB/TrEMBL.
Manual annotation in UniProtKB consists of a critical review of experimentally proven or computer-predicted data about each protein, including the protein sequences. Data are continuously updated by an expert team of biologists.
Here are some specifications and functions that have made UniProtKB popular among scientists and researchers;
- Annotation of data
- The sequence data
- The citation information
- The taxonomic data
- Function(s) of the protein
- Posttranslational modification(s) such as carbohydrates, phosphorylation, acetylation and GPI-anchor
- Domains and sites
- homeoboxes, SH2 and SH3 domains and kringle
- Secondary structure, e.g. alpha helix, beta sheet
- Quaternary structure, i.e. homodimer, heterotrimer, etc.
- Similarities to other proteins
- Disease(s) associated with any number of deficiencies in the protein
- Sequence conflicts, variants etc.
- Minimal redundancy
- Integration with other databases
- Sequence curation.
- Sequence analysis
- Literature curation
- Family-based curation
- Evidence attribution
- Quality assurance, integration and updation
In UniProtKB, annotation of a protein consists of the description of the following: function(s), enzyme-specific information, biologically relevant domains and sites, post-translational modifications, subcellular location(s), tissue specificity, developmentally specific expression, structure, interactions, splice isoform(s), diseases associated with deficiencies or abnormalities, etc. Another important part of the annotation process involves the joining of different reports for a single protein.
Once a protein sequence has been selected for manual annotation on the basis of our curation priorities, Blast searches are run against UniProtKB to identify additional sequences from the same gene and to identify homologs.