Microarray data analysis is the final step in reading and processing data produced by a microarray chip. Samples undergo certain processes including purification and scanning using the microchip, which then produces a large amount of data that requires processing via computer software. It involves many distinct steps. Changing any one of the steps will change the outcome of the analysis, so certain Projects were created to identify a set of standard strategies.
Microarrays can be used in many types of experiments including genotyping, epigenetics, translation profiling and gene expression profiling.
Gene expression profiling is by far the most common use of microarray technology. Both one- and two-colour microarrays can be used for this type of experiment. The process of analyzing gene expression data is similar for both types of microarrays:
- feature extraction
- quality control
- differential expression analysis
- biological interpretation of the results
- submission of data to a public database
Microarray analysis results in the gathering of massive amounts of information concerning gene expression profiles of different cells and experimental conditions. Analyzing these data can often be sloughed, with endless discussion as to what the appropriate statistical analyses for any given experiment might be. As a result many different methods of data analysis have evolved.
Image processing and analysis
The first step in the analysis of microarray data is to process the produced image. Most manufacturers of microarray scanners provide their own software; however, it is important to understand how data is actually being extracted from images, as this represents the primary data collection step and forms the basis of any further analysis. Image processing involves the following steps:
- Identification of the spots and distinguishing them from spurious signals.
- Determination of the spot area to be surveyed, determination of the local region to estimate background hybridization.
- Reporting summary statistics and assigning spot intensity after subtracting for background intensity.
Expression ratios: the primary comparison
The relative expression level for a gene can be measured as the amount of red or green light emitted after excitation. The most common metric used to relate this information is called expression ratio.
Transformations of the expression ratio
The expression ratio is a relevant way of showing expression differences in a very intuitive manner. For example, genes that do not differ in their expression level will have an expression ratio of 1. However, this representation may be unhelpful when one has to represent up-regulation and down-regulation.
Normalization is a term that is used to describe the process of removing variations to allow appropriate comparison of data obtained from the two samples. There are many methods of normalization. The first step in a normalization procedure is to choose a gene-set (which consists of genes for which expression levels should not change under the conditions studied, that is the expression ratio for all genes in the gene-set is expected to be 1.