Bioinformatics Bioinformatics Server Genetics


Pinterest LinkedIn Tumblr

The completion of the human genome project at the starting of the 21st century with the rapid advancement of sequencing technologies thereafter has resulted in exponential growth of biological data. In genetics, this has given rise to many variation databases, generated to store and annotate the ever-expanding dataset of known mutations. Usually, these databases work by focusing on variation at the sequence level. Some databases focus on the analysis of variation at the 3D level such as mapping, visualizing, and determining the effects of variation in protein structures. Additionally, these web servers hardly incorporate tools to help analyze this data. A new mutation analysis web server, the Human Mutation Analysis (HUMA), was presented.HUMA integrates sequence, structure, variation and disease data into a single, connected database. A user-friendly interface gives click-based data access and visualization, while a RESTful web API provides programmatic access to the data. Tools have been integrated into HUMA to allow starting analyses to be carried out on the server. Furthermore, users can upload their private variation datasets, which are automatically mapped to public data and can be analyzed using the integrated tools. 

Basically, the Human Mutation Analysis (HUMA) database and web server has been developed as a platform for the analysis of genetic variation in humans. 

It has been developed at the Research Unit in Bioinformatics (RUBi) at Rhodes University. However, it is freely available for all academic users.

HUMA data is broken down between two separate databases. The public database stores data that has been aggregated from the various public data sources. The private database stores data that should not automatically be shared between users. This data includes user account details, user groups, private datasets, and job results.

The HUMA database has been inhabited using a semi-automated pipeline having a mixture of C++ and Python scripts. The C++ scripts were written to parse large files that took too long to do with Python. The compute intensive mapping of variants to proteins and genes has been also performed using C++ scripts.

HUMA calculates CDS ranges on the chromosome by concatenating exons to generate the coding DNA (cDNA) and finding the position at which the CDS starts within the cDNA. Combined with the chromosomal coordinates of the exons, this allows for the chromosomal coordinates of the CDS ranges to be calculated. Variants can then be mapped to the CDS based on chromosomal coordinates, and, from there, to the protein sequence. This process is done out by the cds_mapper and variant_mapper scripts.

Before the variant_mapper script is executed, all indices, primary keys, and foreign keys are manually removed from the table. This improves performance when inserting data into the table. Once the data has been inserted, the keys and indices are reintroduced to improve lookup performance.

HUMA aggregates data from:

  • Uniprot
  • Ensembl
  • HGNC
  • dbSNP
  • ClinVar
  • PDB
  • OMIM
  • Pfam

Write A Comment