Genome sequences are available for increasing numbers of organisms. The proteomes (protein complement expressed by the genome) of many such organisms are being studied with two-dimensional (2D) gel electrophoresis. The theoretical N and C termini of 15, 519 proteins, representing all SWISS-PROT entries for the organisms were analyzed. Sequence tags were found to be surprisingly specific, with N-terminal tags of four amino acid residues found to be unique for between 43% and 83% of proteins, and C-terminal tags of four amino acid residues unique for between 74% and 97% of proteins, depending on the species studied. Sequence tags of five amino acid residues were found to be even more specific. To utilize this specificity of sequence tags for protein identification, a world-wide web-accessible protein identification program, TagIdent has created which matches sequence tags of up to six amino acid residues as well as estimated protein pI and mass against proteins in the SWISS-PROT database.
The utility of this identification approach with sequence tags has generated from 91 different E. coli proteins purified by 2D gel electrophoresis. Fifty-one proteins were unambiguously identified by virtue of their sequence tags and estimated pI and mass, and a further 11 proteins identified when sequence tags were combined with protein amino acid composition data. TagIdent identification approach is best suited to the identification of proteins from prokaryotes whose complete genome sequences are available. The approach is less well suited to proteins from eukaryotes, as many eukaryotic proteins are not amenable to sequencing via Edman degradation and tag protein identification cannot be unambiguous unless an organism’s complete sequence is available.
TagIdent makes use of the high specificity of short amino acid sequence tags in molecularly well defined organisms with small proteomes and a low degree of post-translational modifications and, in particular, few N-terminally blocked proteins. If more than one protein satisfies the user-specified tag and pI/Mw ranges, TagIdent produces an unranked list of all candidate database entries.
The TagIdent tool serves two main purposes.
- Firstly, it can create lists of proteins from one or more organisms that are within a user-specified pI or Mw range. This is useful to find proteins from the database that may be in a region of interest on a 2-D gel.
- Secondly, the program can identify proteins from 2-D gels by virtue of their estimated pI and Mw, and a short protein “sequence tag” of up to 6 amino acids. The sequence tag can be derived from protein N-termini, C-termini, or internally, and generated by chemical or mass spectrometric sequencing techniques.
If desired, a name can be given to the query, which will appear as the subject of the e-mail message. We should specify the pI and Mw regions within which we would like to search.
If we would like to search using only one of the pI or Mw parameters, we can specify an unrestricted window to cover all possibilities for the other parameter.
Finally, we can specify one or more keywords matching those in the Swiss-Prot OS (species) or OC (classification) lines to limit the search to one organism, or a range of organisms. Thus if we want to investigate proteins exclusively from S. cerevisiae, we can specify “CEREVISIAE”. This is better than specifying “YEAST”, a word common to the classification of many yeasts which includes not only proteins from S. cerevisiae, but also those from Candida albicans and Schizosaccharomyces pombe. Finally, select the “Start TagIdent” button to submit the request to ExPASy. Results will be displayed immediately, or, if we specified our e-mail address (for longer jobs) sent in a few minutes to our email address.
If protein identification results with TagIdent show more than one protein carrying the sequence tag in the expected region, the same sequence tag, pI and Mw data can be used in conjunction with protein AA composition for identification with the AACompIdent tool.