A DNA sequence that is involved in the regulation of genes is called Transcription Factor Binding Site. It has binding sites for RNA Polymerase and binding sites for transcription factors. Activity of protein complexes bound to promoter regions can activate the genes, or repress its transcription, or somewhere in between.
Eukaryotic regulatory regions are identified based on a set of discovered transcription factor binding sites (TFBSs), which can be shown as sequence patterns with certain degree of degeneracy.
Binding of a transcription factor (TF) to its DNA binding sites is a typical and essential step to start the transcription of its target genes. Typically, a TF binding site (TFBS) is 5 to 15 base pairs (bp) long within the promoter of its target gene and a TF protein usually can recognize a set of similar DNA sequences with varying degrees of binding affinity. In view of the importance of TFBSs in gene regulation, it is useful to know how the TFBSs of a gene are spatially divided in its promoter region.
Binding of transcription factors to transcription factor binding sites (TFBSs) is the interposition of transcriptional regulation. Information on experimentally validated functional TFBSs is limited and resultantly there is a need for accurate prediction of TFBSs for gene annotation and in applications such as evaluating the effects of single nucleotide variations in causing disease.
Tools for TFBSs recognition are;
- Position weight matrix (PWM)
Experimental identification of transcription factor binding sites
There are many in vitro and in vivo experimental approaches that have been used to identify transcription factor binding sites. In vitro methods include:
- The Electro-Mobility Shift Assay (EMSA), which uses the ability of a non-denaturing polyacrylamide gel to act as a molecular sieve, separating protein-bound DNA from unbound DNA.
- The DNase I footprinting/protection assay combines the cleavage reaction of DNase I with EMSA. A main problem with both EMSA and DNase I footprinting is the identification of unwanted protein-DNA interactions that result from non-specific DNA binding proteins.
- Systematic Evolution of Ligands by EXponential enrichment (SELEX) that works by screening a large pool of short, random oligonucleotide probes which are identified by a TFBS of interest. A refinement, SELEX-seq, involves the selected dsDNAs being subjected to massively parallel sequencing.
- Chromatin ImmunoPrecipitation (ChIP) assay, a variation of the ‘pull down’ class of assay, the DNA-binding protein of interest is cross-linked to the DNA using formaldehyde. The DNA is then fragmented into small fragments of around 100–1000 b.p. and an antibody specific for a given transcription factor is then used to immunoprecipitate the DNA-protein complex. The cross-links are then reversed, releasing the DNA for PCR amplification. High throughput versions of the ChIP assay involve hybridizing the resulting fragments to genomic tiling microarrays, an approach known as ChIP-chip, or the resulting DNA fragment.