dbVar is a database of human genomic structural variation where users can search, view, and download data from submitted studies. dbVar provides access to the raw data whenever available, as well as links to additional resources, from both NCBI and elsewhere.
It provides archival, data accessioning and distribution services for genomic structural variation (GSV). dbVar is a comprehensive resource that includes data originating from the 1000 Genomes project, The Wellcome Trust Sanger Institute Mouse Genomes, COSMIC project and from many clinical genetics studies. Users can navigate to particular studies or can perform text-based searches using the standard NCBI Entrez search interface.
In recent years there have been unprecedented advances in the technologies that characterize genomic variation, and it is well known that variation at the single nucleotide level is abundant across the genomes of all species. However, it is becoming clear that genomic structural variation – this is variation ranging from tens to millions of base pairs in size and includes insertions, deletions, inversions, translocations and locus copy number changes – accounts for more of the individual differences at the base pair level in humans and is likely to play a major role in disease. Two other areas of research that are becoming increasingly important in this field are discovering how genomic structural variation affects an individual’s characteristics, and understanding the role it has played in the evolution of species.
The National Center for Biotechnology Information (NCBI) creates and maintains a set of databases that archive, process, display and report information related to human germline and somatic variants. These databases, primarily the Database of Short Genetic Variations (dbSNP) and the Database of Genomic Structural Variations (dbVar) represent almost 2 billion submitted human variants. The primary roles of both databases are to process submissions, archive the data, annotate on the genome and NCBI Reference Sequences (RefSeqs), and distribute it worldwide. The data is important for studying the basis of human diseases to improve diagnosis, treatment, and prevention and for research in a variety of fields such as species diversity, evolution, and conservation. Submission is accepted in various formats including VCF for reporting numerous variations generated by high-throughput sequencing (HTS) projects over multiple populations, as well as a wide variety of associated data including genotype and allele frequency data. Each submitted variant is assigned a database identifier (ss# in dbSNP or nsv#/esv# in dbVar) for citing in publications, allow cross-reference to other databases and linking to related data, facilitate annotation, and promote data exchange. These submissions are then processed to aggregate information from multiple submitters (rs# in dbSNP) and to calculate locations and functional consequences on RefSeqs and to integrate with other NCBI resources including Gene, PubMed, Nucleotide, Protein, and Genome. dbVar data are updated during regular build cycle with annotations on new assemblies and RefSeqs and the data distributed in diverse ways: Entrez searches, study-specific reports, annotation on the genome, Sequence Viewer, and FTP downloads as BED, VCF, and other.
Structural variation (SV) is generally defined as a region of DNA approximately 1 kb and larger in size and can include inversions and balanced translocations or genomic imbalances (insertions and deletions), commonly referred to as copy number variants (CNVs). These CNVs often overlap with segmental duplications, regions of DNA >1 kb present more than once in the genome, copies of which are >90% identical. If present at >1% in a population a CNV may be referred to as copy number polymorphism (CNP).
The easiest way to browse all dbVar clinical variants is to visit the Clinical Structural Variants data track in NCBI’s Variation Viewer or connect to the Public dbVar Hub at the UCSC Genome Browser.