The BED (Browser Extensible Data) format is a tab-delimited text file format used to store genomic regions as coordinates and related annotations and it defines a feature track. The data are presented in the form of columns separated by spaces or tabs. This format was developed during the Human Genome Project and then adopted by other sequencing projects. It is now widely used. It can have any extension, but .bed is recommended. The BED format does not have any official specifications.
One of the advantages of this format is the manipulation of coordinates instead of nucleotide sequences, which optimizes the power and computation time when comparing all or part of genomes. In addition, its simplicity makes it easy to manipulate and read (or parsing) coordinates or annotations using word processing and scripting languages such as Python, Ruby or Perl or more specialized tools such as BED Tools. The use of BED files has spread rapidly with the emergence of new sequencing techniques and the manipulation of larger and larger sequence files. Handling BED files makes this work more efficient by using coordinates to extract sequences of interest from sequencing sets or to directly compare and manipulate two sets of coordinates.
In detail, BED files encode annotation on locations inclusive of the start position and exclusive of the stop. Positions are 0-based, and a tab or space is used as field separator. BED files should only be used for chromosome or scaffold location submission. No header or pragmas are necessary for BED formatted data.
The BED format consists of one line per feature, each containing 3-12 columns of data, plus optional track definition lines.
- Chrome: The name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671)
- chromStart: The starting position of the feature in the chromosome or scaffold
- chromEnd: The ending position of the feature in the chromosome or scaffold
The 9 additional optional BED fields are:
- name: Defines the name of the BED line. This label is displayed to the left of the BED line in the Genome Browser window
- score: A score between 0 and 1000. If the track line use Score attribute is set to 1 for this annotation data set, the score value will determine the level of gray in which this feature is displayed
- strand: Defines the strand. Either “.” (=no strand) or “+” or “-“.
- thickStart: The starting position at which the feature is drawn thickly (for example, the start codon in gene displays).
- ThickEnd: The ending position at which the feature is drawn thickly (for example the stop codon in gene displays)
- ItemRgb: An RGB value of the form R,G,B (e.g. 255,0,0). If the track line itemRgb attribute is set to “On”, this RGB value will determine the display color of the data contained in this BED line
- BlockCount: The number of blocks (exons) in the BED line
- BlockSizes: A comma-separated list of the block sizes
- BlockStarts: A comma-separated list of block starts