The bioinformatics work includes the gene annotation work. In recent years more and more biological data has become available. Meanwhile, how to get access to these valuable data resources and analyze the data is important for comprehensive bioinformatics data analysis. The BioMart is a very useful tool to achieve that.
BioMart is a community-driven project to provide a single point of access to distributed research data. The BioMart project contributes open source software and data services to the international scientific community. Although the BioMart software is primarily used by the biomedical research community, it is designed in such a way that any type of data can be incorporated into the BioMart framework. The BioMart project originated at the European Bioinformatics Institute as a data management solution for the Human Genome Project. Since then, BioMart has grown to become a multi-institute collaboration involving various database projects on five continents.
The first way to use BioMart is online ID conversion. We could go to;
and then select the corresponding datasets, filters and attributes. If we click the ‘Results’ button, we could see the final outputs.
The second way is to use biomaRt, which is an R Bioconductor package.
BioMart allows databases hosted on different servers to be presented seamlessly to users, facilitating collaborative projects. BioMart has certain levels of query optimization to efficiently manage large data sets, and offers a diverse selection of graphical user interfaces and application programming interfaces to allow queries to be performed in whatever manner is most convenient for the user. BioMart’s capabilities are extended by integration with several widely used software packages such as Bioconductor, Galaxy, Cytoscape, and Taverna.
BioMart is an easy-to-use web-based tool that allows extraction of data without any programming knowledge or understanding of the underlying database structure. You can navigate through the BioMart web interface using the left panel. Filters and attributes can be selected in the right panel.
- Select a mart database (a type of data)
First, select a mart database which will correspond to the type of data you are interested in. In Ensembl, you can choose data from one of our four mart databases:
- Ensembl Genes: This mart contains the Ensembl gene set and allows us to retrieve Ensembl genes, transcripts and proteins as well as external references, microarrays, protein domains, structure, sequences, variants (only variants mapped to Ensembl Transcripts) and homology data.
- Ensembl Variation: This mart allows us to retrieve germline and somatic variants as well as germline and somatic structural variants.
- Ensembl Regulation: This mart allows us to retrieve regulatory features, evidence and segments, miRNA target regions, binding motifs and other regulatory regions.
- Vega: This mart contains the Ensembl Vega gene set (manual annotation coming from Havana) and allows us to retrieve Ensembl Vega genes, transcripts and proteins as well as external references, structures, sequences and protein domains
2. Select a mart dataset (a species)
Next, select a smart dataset which corresponds to the species we are interested in and want to retrieve data from.
The marts are available for the following species:
- Ensembl Genes
- Ensembl Variation
- Ensembl Regulation
3. Filter your mart query (query restriction and input data)
BioMart allows us to restrict our query with information that we know, e.g: input a list of IDs, restrict to a region. We can access the filter page by clicking on the “Filters” button located on the left panel.
4. Select mart attributes (desired output)
By clicking on the “Attributes” button on the left panel, we will access the mart attribute page. This page allows us to select our desired output; the default output is “Ensembl Gene ID” and “Ensembl Transcript ID” in the Ensembl Genes mart.
5. Display and retrieve your query
Clicking on the “Results” button will bring us to the mart result page. This page will, by default, show you a preview of the first 10 results of your query in HTML format. The number of results previewed and format can be changed as indicated by number 1 on the image below. We can also automatically remove all the duplicated results from our query by clicking on the “Unique results only” button.