The progress in the area of biological research in recent years has led to a multiplicity of different databases and information systems. Molecular biology deals with complex problems and an enormous amount of versatile data will be produced by high-throughput techniques. Hence, the total number of databases, as well as the data itself, is continuously increasing, and with it the distribution and heterogeneity of the data rises. The importance of database integration has been recognized for many years.
A Data Warehousing (DW) is a process for collecting and managing data from varied sources to provide meaningful business insights. A Data warehouse is typically used to connect and analyze business data from heterogeneous sources. The data warehouse is the core of the Bioinformatics system which is built for data analysis and reporting. Data warehouses (DWH) are the widely used architectures of materialized integration in informatics and especially in Bioinformatics. Basically, data warehouses are used in the field of information management.
It is a blend of technologies and components which helps the strategic use of data. It is electronic storage of a large amount of information by a business which is designed for query and analysis instead of transaction processing.
There are decision support technologies that help utilize the data available in a data warehouse. These technologies help executives to use the warehouse quickly and effectively. They can collect data, analyze it, and take decisions based on the information present in the warehouse. The information gathered in a warehouse can be used in any of the following domains;
- Tuning Production Strategies
- Customer Analysis
- Operations Analysis
Newly developed methods and instrumentation, such as high throughput sequencing and automation in genomics and proteomics, produce volumes of raw biological data at an explosive rate. In parallel with the growth of data, certain computational tools for improved data analysis and management have emerged. These tools help extract relevant parts of the data (data reduction), establish correlations between different views of data (correlation analysis), and convert the information to knowledge discoveries (data mining). In addition, recent research has expanded into data storage and data management focusing on structure of the databases (data modeling), storage media (relational, flat file-based, XML, and others), and quality assurance of data. Molecular biology data management systems usually take the form of publicly accessible biological databases. A database is designed to manage a large amount of persistent, homogeneous, and structured data that is shared among distributed users and processes. When a dataset is organized in the form of a database, it must remain manageable and usable, supporting both data growth and increase in the number of database queries. In Bioinformatics, the development of databases has been driven by an explosive growth of data as well as increasing user access to this data.