After a genome has been sequenced, assembled and annotated it needs to be shared and stored in a format that is easily and freely accessible to all. This can be done via a database called a Genome Browser. Here, the stored data of the genome could be used by anyone at any time. So there arose a need for the data retrieval methods.
Researchers and scientists need data due to;
- Analysis of organisms functional and evolutionary history, which requires combining disparate data from a variety of sources
- Reliable information resources, compiling data on sequenced genomes and linking it to the wealth of associated functional data
- Study on comparative genomics
- Potential working to personalize modern medicine
- Understanding the blueprint for building any organism
- Learn more about the functions of genes and proteins, that knowledge will have a major impact in the fields of medicine, biotechnology, and the life sciences
- knowing about the regions of DNA that have other important roles, such as the regulation of our genes etc.
The amount of genome-related information stored in public databases and freely available to anyone with Internet access is enormous. It has been experienced, however, that many researchers who should benefit the most from this information are not comfortable navigating these databases, let alone assessing the reliability of the data.
To retrieve an entire genome sequence, first users can check whether or not the genome, proteome, CDS, RNA, GFF, GTF, or genome assembly statistics of their interest are available for download. Using the scientific name of the organism of interest, users can check whether the corresponding genome is available or not.
List for genome retrieval Databases
- NCBI Genomes
- Ensembl Genomes
- Personal Genome Project
- GMOD Project
Sequence retrieval from NCBI
The Genome database contains sequence and map data from the whole genomes of over 1000 species or strains. The genomes represent both completely sequenced genomes and those with sequencing in-progress. All three main domains of life (bacteria, archaea, and eukaryota) are represented, as well as many viruses, phages, viroids, plasmids, and organelles. Visit the NCBI site and select the “Genome” database and write the name of the organism whose genome is required.
A page opened there. Download the genome sequence for an organism, all the cDNA, genes, proteins, or ncRNAs for a species, and more with the ftp site. You can get the whole mouse genome sequence, all the proteins in the human genome, or the genes for zebrafish etc. You can also download GenBank files, gene sets in GTF formats, or the MySQL tables themselves while retrieval of genomes.
Sequence retrieval from Ensembl
See the README file in the directory for general information about the organization of the ftp files.
- Locate the directory for your organism of interest. Within that directory a README file will describe the various files available. In many cases, the sequence data of the genome is segregated into directories for each chromosome.
- Use any FTP client to download the data.