Ontologies are a concept imported from computing science to describe different conceptual frameworks that guide the collection, organization and publication of biological data. An ontology is similar to a paradigm but has very strict implications for formatting and meaning in a computational context.
The Gene Ontology (GO) provides a system for hierarchically classifying genes or gene products into terms organized in a graph structure (or an ontology). Each gene can be described (annotated) with multiple terms. The GO is actively used to classify genes from humans, model organisms and a variety of other species.
The GO database is a relational database comprising the GO ontologies and the annotations of genes and gene products in terms of the GO. The advantage of housing both the ontologies and annotations in a single database is that powerful queries can be performed over annotations using the ontology. The GO database is built from source data at regular intervals, and is currently maintained as a MYSQL database.
The Gene Ontology (GO) project provides structured, controlled vocabularies and classifications that cover several domains of molecular and cellular biology and are freely available for community use in the annotation of genes, gene products and sequences. Many model organism databases and genome annotation groups use the GO and contribute their annotation sets to the GO resource. The GO database integrates the vocabularies and contributed annotations and provides full access to this information in many formats. Members of the GO Consortium continually work collectively, involving outside experts as needed, to expand and update the GO vocabularies. The GO Web resource also provides access to extensive documentation about the GO project and links to applications that use GO data for functional analyses.
The GO database consists of a MySQL database that captures GO content and a Perl object model and Application Programmer Interface (API) to simplify database access and help programmers write tools that use the GO data. The GO relational database is released monthly in several versions:
- termdb includes the ontologies, definitions and cross-references to other databases; assoc db includes all data in termdb plus associations to gene products.
- seqdb adds protein sequences for annotated gene products (where available). A fourth version, seablite, is equivalent to seqdb without the IEA-based associations; this version is used by the AmiGO browser.
The GO database schema models generic graphs, including the GO structure (a directed acyclic graph, or DAG) relationally. At the core of the schema are two relational tables for capturing all terms (also called nodes) and term–term relationships (arcs). The two relationship types, ‘is-a’ and ‘part-of,’ are represented as a ‘relationship type’ attribute in the relationship table. The GO database is generated from the most recent version of the ontology and the annotation files contributed by members of the GO Consortium.
The AmiGO browser and search engine provides web browser-based access to the GO database. As well as allowing users to search, browse, and download terms and annotations, AmiGO has analysis tools for further data processing. There are several sites that provide mirrors of the GO database accessible through a MySQL client.