Bioinformatics Sequence Analysis

Sequence Logos

A graphical method is introduced for showing the patterns in a set of aligned sequences. The characters representing the sequence are stacked on top of each other for each position in the aligned sequences. The height of each letter is made proportional to its frequency, and the letters are sorted so the most common one is on top. The height of the complete stack is then adjusted to signify the information content of the sequences at that position. From these ‘sequence logos’, one can find not only the consensus sequence but also the relative frequency of bases and the information content (measured in bits) at every position in a site or sequence. The logo displays both significant residues and subtle sequence patterns.

In the search for homologous sequences, researchers are often interested in conserved sites/residues or positions in a sequence which tend to differ a lot. Most researches use alignments for visualization of homology on a given set of either DNA or protein sequences. In proteins, active sites in a given protein family are often highly conserved. Thus, in an alignment these positions are fully or nearly fully conserved. On the other hand, antigen binding sites in the certain units of immune globulins tend to differ quite a lot, whereas the rest of the protein remains relatively unchanged. In DNA, promoter sites or other DNA binding sites are highly conserved. This is also the case for repressor sites.

When aligning such sequences, regardless of whether they are highly variable or highly conserved at specific sites, it is very difficult to generate a consensus sequence which covers the actual variability of a given position. In order to better understand the information content or significance of certain positions, a sequence logo can be used. The sequence logo displays the information content of all positions in an alignment as residues or nucleotides stacked on top of each other. The sequence logo provides a far more detailed view of the entire alignment than a simple consensus sequence. Sequence logos can help to identify protein binding sites on DNA sequences and can also aid to identify conserved residues in aligned domains of protein sequences and a wide range of other applications.

The total height of a logo position depends on the degree of conservation in the corresponding multiple sequence alignment column. Very conserved alignment columns produce high logo positions.
The height of each letter in a logo position is proportional to the observed frequency of the corresponding amino acid in the alignment column.

  • LogOddsLogo
  • pLogo
  •  BlockLogo
  • Seq2Logo
  •  RILogo 
  • CorreLogo
  • enoLOGOS
  • ggseqlogo
  • Logomaker
  • WebLogo

WebLogo is an interactive program or Sequence logo tool for generating sequence logos. A user just has to enter the sequence alignment in FASTA format to allow the program to compute the logos. As a result, a graphic file is returned to the user.

