A BioSample contains descriptive information about the physical biological specimen from which your experimental data are derived and biological source materials used in experimental assays. The BioSample database stores submitter-supplied descriptive information, or metadata, about the biological materials from which data stored in NCBI’s primary data archives are derived. NCBI’s archives host data from different types of samples from any species, so the BioSample database is similarly diverse; typical examples of a BioSample include a cell line, a tissue biopsy or an environmental isolate.
As the number and complexity of primary data archives supported by NCBI expands, a need has emerged for a shared database in which to host information about the biological samples from which those data are derived. NCBI’s BioSample database stores descriptions of the biological materials under examination in a project.
The BioSample database was launched in 2011 to begin to help address these needs. It facilitates the capture and management of structured metadata descriptions for diverse biological samples and encourages data producers to provide a rich set of contextual metadata with their data submissions. The database was initially joined with existing sample descriptions extracted from SRA, dbGaP, EST and GSS.
- capture sample metadata in a structured way by promoting use of controlled vocabularies for sample attribute field names
- link sample information to corresponding experimental data across multiple archival databases
- reduce submitter burden by enabling one-time upload of a sample description, then referencing that sample as appropriate when making data deposits to other archives
- support cross-database queries by sample description
- connect biosystems records with associated literature, molecular, and chemical data
- BioSample records are indexed and searchable
- promotes the use of structured and consistent attribute names and values to describe sample properties and origin
The BioSample database supports capture of various types of relationships between samples. For example, for samples that represent cell lines derived from individuals with known family relations, pedigree information can be captured and used to group related samples together, facilitating linking to additional relevant records. The BioSample database does not support controlled access mechanisms and thus cannot host human clinical samples that may have associated privacy concerns. The BioSample records also contain IDs, organism, title, description, links, owner, access information.
BioSample submission records the nature of the biological material that has been sequenced. In the BioSample portal, submitters describe the biological material under investigation in their project. After specifying the sample type, the user is presented with a list of required and optional attribute fields to fill in, as well as the opportunity to supply any number of custom-descriptive attributes. But ultimately, BioSample is a submitter-driven repository in that submitters are responsible for the quality and content of their deposits. BioSample maintains a list of recognized attributes which participate in one or more BioSample packages. In addition to recognized attributes, submitters may provide any number of custom attributes to fully describe a sample. The attributes are name, HarmonizedName, Synonym, Description, and Format.