UniProt is a free database for protein sequence and functional information of proteins, that is used to access the information of proteins in a comprehensive manner. The information of biological function of proteins and all its annotated data is stored in this central resource and these have been derived from the research literature and genome sequencing projects.it is freely and easily accessible to the researchers and scientists and easy to use. UniProt accepts primary sequences of proteins derived from peptide sequencing experiments. It stores and interconnects the huge form of data from different sources.
For the very fast and continuously ongoing growth of predicted protein sequences by high-throughput genome sequencing for many and increasingly diverse organisms, the expansion of large-scale proteomics (e.g. gene expression profiling and protein–protein interactions) and the emergence of structural genomics have combined to provide a wealth of data to analyze and use for the researchers. There was a widely recognized need for a centralized repository of protein sequences with comprehensive coverage and a systematic approach to protein annotation, incorporating, integrating and standardizing data from these various sources. Expert curators have collected the detailed, comprehensive and curated annotations from the literature for over a half million of these proteins.
UniProt is developed under the highly scientific expertise and the extensive bioinformatics infrastructure at EMBL-EBI. The long-term preservation of the UniProt databases is under the safe hands of the UniProt consortium and host institutions EMBL-EBI, SIB and PIR because they have committed to protect it. Besides this, more than 100 people and project institutions are involved in this commitment through different tasks such as curation of data, software development and support etc.
Four contrasting components have been developed under the UniProt for different uses;
- The UniProt Knowledgebase (UniProtKB)
It is a competently curated database, a central access point for integrated protein information with cross-references to multiple sources.
- The UniProt Archive (UniParc)
It is a comprehensive sequence repository, showing the history of all protein sequences between different species.
- UniProt Reference Clusters (UniRef)
It merges closely related sequences based on sequence identity to speed up searches.
- The UniProt Metagenomic and Environmental Sequences (UniMES) database
It is a repository developed for the newly expanding area of metagenomics and environmental data.
Benefits of using Uniprot
- Finding protein sequences
- Inferring evolutionary information
- Know about the name and taxonomy of proteins
- Incorporated sites related to our proteins
- Study of biological processes
- Study of molecular function
- Study of post translational modifications
- Their interaction with other molecules
- Their location in cells and organisms
- Blast the sequences
Success of UniProt is still going on because of;
- Growth of sequences in UniProt
- Reference proteomes
- Expert curation progress
- Automatic annotation progress
An important aspect of UniProt is to connect the papers with the relevant entries. This is an on working task and the previous and newcomer users should gain training via the web , the API download from the FTP site for the better use of UniProt.