A database designed to capture experimental or inferential results that support submitter-provided annotation for sequence data that the submitter did not directly determine but derived from GenBank primary data.
TPA records are divided into two categories:
- TPA:experimental: Annotation of sequence data is supported by peer-reviewed wet-lab experimental evidence.
- TPA:inferential: Annotation of sequence data by inference (where the source molecule or its product(s) have not been the subject of direct experimentation)
TPA database records differ from GenBank and RefSeq records:
- GenBank: An archival database of primary nucleotide sequences that were directly sequenced by the submitter.
- RefSeq: A curated, non-redundant database that includes genomic DNA, transcript (RNA), and protein products, for major organisms. The sequence data are derived from GenBank primary data, and the annotation is computational, from published literature, or from domain experts.
A TPA sequence is derived or assembled from primary sequence data currently found in the DDBJ/EMBL/GenBank International Nucleotide Sequence Database. It can be genomic or mRNA sequence and can be assembled or derived from primary genomic and/or mRNA sequences. TPA sequences are submitted to DDBJ/EMBL/GenBank as part of the process of publishing biological experiments that include the annotation of existing, primary nucleotide sequences.
Examples of TPA sequences are:
- mRNA assembled from overlapping EST sequences.
- mRNA derived from an unannotated section of genomic sequence by comparison with another known mRNA from a different organism.
- mRNA assembled from overlapping EST sequences, other partial mRNAs, and/or genomic sequences.
- previously unannotated genomic sequence now described with the exons, introns, and coding region information (CDS) of a new gene.
Primary sequences used to assemble a TPA sequence are those that have been experimentally determined and are now publicly available in the GenBank/EMBL/DDBJ databases. These may also be trace data sequences and Whole Genome Shotgun (WGS) sequences. They may not be from a proprietary database. Each primary sequence used to assemble a TPA sequence must be identified by an Accession Number in the submission of the TPA sequence.
- Primary entries used to build a TPA sequence are those that have been experimentally determined and are publicly available in INSDC. Each primary entry must be identified in the TPA entry.
- Primary entries are sometimes not yet published at the submission of TPA sequence. However, the primary entries must be publicized when the TPA sequence is opened to the public.
The following cases are NOT acceptable in TPA
- Annotation of repeat (and no other) features.
- Annotation that has arisen from an automated tool, such as GeneMark,tRNA scan or ORF finder, where no further evidence, experimental or otherwise, is presented for the annotation. The annotation in these cases has not been the subject of the peer review of the publication.
- A record representing a completely sequenced genome including only features that have not been assigned gene symbols or product identifiers, for which none has wet laboratory experimental evidence.