Bioinformatics Databases


The National Center for Biotechnology Information has created the dbGaP public repository for individual-level phenotype, exposure, genotype, and sequence data, and the associations between them. dbGaP assigns stable, unique identifiers to studies and subsets of information from those studies, including documents, individual phenotypic variables, tables of trait data, sets of genotype data, computed phenotype-genotype associations and groups of study subjects who have given similar approval for use of their data.

The technical advances and declining costs for high-throughput genotyping afford investigators fresh opportunities to do increasingly complex analyses of genetic associations with phenotypic and disease characteristics. The leading candidates for such genome wide association studies (GWAS) are existing large-scale cohort and clinical studies that collected rich sets of phenotype data. It has been developed to support investigator access to data with stable identifiers that make it possible for published studies to discuss or cite the primary data in a specific and uniform way. dbGaP provides unprecedented access to the large-scale genetic and phenotypic datasets required for GWAS designs, including public access to study documents linked to summary data on specific phenotype variables, statistical overviews of the genetic information, position of published associations on the genome, and authorized access to individual-level data.

The purposes of dbGaP are three-fold: 

  • to describe dbGaP’s functionality for users and submitters
  • to describe dbGaP’s design and operational processes for database methodologists to emulate or improve upon
  • to reassure the lay and scientific public that individual-level phenotype and genotype data are securely and responsibly managed

dbGaP accommodates studies of varying design. It contains four basic types of data: 

  • Study documentation, including study descriptions, protocol documents, and data collection instruments, such as questionnaires
  • Phenotypic data for each variable assessed, at both an individual level and in summary form
  • Genetic data, including study subjects’ individual genotypes, pedigree information, fine mapping results, and re-sequencing traces
  • Statistical results, including association and linkage analyses, when available

The dbGaP document system formats, accessions, indexes, and displays all submitted study documentation in such a way that;

  • links are created between the variable values in the database and their references in the documents
  • document representation can be rendered in a web browser for easy navigation between variable summary report pages and their referencing documents
  • a framework is established that will support future generation of web-based forms and questionnaires that capture variable data directly into dbGaP

The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies investigating the interaction of genotype and phenotype in humans.

Genome Wide Association Studies (GWAS) are hypothesis-free methods for identifying associations between genetic regions (loci) and traits (including diseases). Datasets from genome-wide association studies (GWAS) are stored in the database of Genotypes and Phenotypes (dbGaP). dbGaP has two levels of access, open (public) for summary information and controlled for individual information. Investigators submitting data to dbGaP provide a description of their study, which will be viewable to the public. The data submitted must be de-identified to protect the privacy of study participants according to criteria defined in the HIPAA Privacy Rule. 

There are two key components in dbGaP:

  • Authentication
  • Authorization

