Bioinformatics Bioinformatics Analysis Genome Analysis

Genome Retrieval & Analysis

The sequence retrieval tool allows downloading of nucleotide and protein sequences including chromosomes, scaffolds, genes, mRNAs, transcript coding sequences even whole genome sequences, protein, reftrans contigs and unigene contigs. Data retrieval from different databases requires a search capability using a data retrieval system (tool). Some common data retrieval systems are;

  1. Entrez/GQuery
  2. DBGET/LinkDB
  3. Sequence Retrieval System (SRS)
  4. retrieval system from EMBL-EBI

SRS supports the data structure of the libraries by providing special indices for feature tables or hierarchically structured data–fields (e.g. taxonomic classification). A language (ODD) has been designed for the convenient specification of library format and organization, representation of individual data–fields within the system (design of indices) and structuring other data needed during retrieval. This ensures flexibility required for coping with different library formats, which are subject to continuous change. Queries and inspection of retrieved entries can be performed from a user interface with pull–down menus and windows. SRS supports many input and output formats but is particularly well adapted to the GCG programs. SRS is a homogeneous interface to over 80 biological databases that has been developed at the European Bioinformatics Institute (EBI) at Hinxton, UK. It includes databases of sequences, metabolic pathways, transcription factors, application results (like BLAST, SSEARCH, FASTA), protein 3-D structures, genomes, mappings, mutations, and locus specific mutations.

DBGET is an integrated database retrieval system, developed at the university of Tokyo. 

Entrez is a molecular biology database and retrieval system developed by the National Center for Biotechnology information (NCBI). It is an entry point for exploring distinct but integrated databases. 

In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. It includes;

  • Sequencing
  • Sequencing assembly
  • Alignment
  • Searching in Database

DNA sequencing is used to determine the sequence of individual genes, full chromosomes or entire genomes of an organism. DNA sequencing has also become the most efficient way to sequence RNA or proteins.

Sequence analysis involves DNA and protein sequencing. The methods for DNA sequencing involves;

  • Sanger Sequencing Method
  • Pyrosequencing Method
  • Shotgun Sequencing Method

The methods for protein sequencing are;

  • Edman Degradation 
  • Mass spectrometry

Databases for searching are;

  • UniGene at NCBI
  • DNA Data Bank of Japan
  • EMBL

Methodologies in sequence analysis used include sequence alignment, searches against biological databases, and others. Since the development of methods of high-throughput production of gene and protein sequences, the rate of addition of new sequences to the databases increased exponentially. Such a collection of sequences does not, by itself, increase the scientist’s understanding of the biology of organisms. However, comparing these new sequences to those with known functions is a key way of understanding the biology of an organism from which the new sequence comes. Thus, sequence analysis can be used to assign function to genes and proteins by the study of the similarities between the compared sequences. Nowadays, there are many tools and techniques that provide the sequence comparisons (sequence alignment) and analyze the alignment product to understand its biology.

