Bioinformatics Bioinformatics File Formats Sequence Format

Gene File Format/Gene Transfer Format

Pinterest LinkedIn Tumblr

The GFF and GTF formats are used for annotating genomic intervals at high levels. In Bioinformatics, the general feature format (gene-finding format, generic feature format, Gene File Format, GFF) is a file format used for describing genes and other features of DNA, RNA and protein sequences.

The GFF file type is primarily associated with SignalMap by NimbleGen Systems Inc. and we need a suitable software called SignalMap from NimbleGen Systems Inc. to open a GFF file. GFF is produced by UniProt and is used by client servers such as GBrowse, Jalview, JBrowse and ZENBU etc.

  1. seqname – name of the chromosome or scaffold: chromosome names can be given with or without the ‘chr’ prefix. The seqname must be one used within Ensembl, such as a standard chromosome name or an Ensembl identifier such as a scaffold ID, without any additional content such as species or assembly
  2. Source: name of the program that generated this feature, or the data source (database or project name)
  3. feature: feature type name, e.g. Gene, Variation, Similarity
  4. start: Start position of the feature, with sequence numbering starting at 1
  5. End: End position of the feature, with sequence numbering starting at 1
  6. Score: A floating point value
  7. Strand: defined as + (forward) or – (reverse)
  8. Frame: One of ‘0’, ‘1’ or ‘2’. ‘0’ indicates that the first base of the feature is the first base of a codon, ‘1’ that the second base is the first base of a codon, and so on
  9. Attribute: A semicolon-separated list of tag-value pairs, providing additional information about each feature

The Gene transfer format (GTF) is a file format used to hold information about gene structure. It is a tab-delimited text format based on the general feature format (GFF), meaning it has been borrowed from GFF but contains some additional conventions and structure specific to gene information. A significant feature of the GTF that can be validated, given a sequence and a GTF file, one can check that the format is correct. This significantly reduces problems with the interchange of data between groups. We can obtain GTF files easily from the UCSC table browser and Ensembl. It is a widely used format for storing the gene annotations. 

This format also contains 9 fields. Fields must be tab-separated. Also, all but the final field in each feature line must contain a value while “empty” columns should be denoted with a ‘.’. 

The first eight GTF fields are the same as GFF. The feature field is the same as GFF, with the exception that it also includes the following optional values: 5UTR, 3UTR, inter, inter_CNS, and intron_CNS. The group field has been expanded into a list of attributes for GTF. Each attribute consists of a type/value pair. The Attributes field is the same for both of the GFF and GTF but with the differences in the content and format. Attributes must end in a semi-colon, and be separated from any following attribute by exactly one space.

Write A Comment