Bioinformatics Bioinformatics Scripting

Plotting of Data in R

Pinterest LinkedIn Tumblr

Strip Charts

A strip chart is the most basic type of plot available. It plots the data in order along a line with each data point represented as a box. To create a strip chart of this data, the stripchart command is used.

Histograms

A histogram is a very common plot. It plots the frequencies that data appears within certain ranges. To plot a histogram of the data, the “hist” command is used.

Box Plots

A box plot provides a graphical view of the median, quartiles, maximum, and minimum of a data set. 

Scatter Plots

A scatter plot provides a graphical view of the relationship between two sets of numbers. The command to plot each pair of points as an x-coordinate and a y-coordinate is “plot:”

Normal QQ Plots

The final type of plot that we look at is the normal quantile plot. This plot is used to determine if our data is close to being normally distributed. We cannot be sure that the data is normally distributed, but we can rule out if it is not normally distributed. The command to generate a normal quantile plot is qqnorm. We can give it one argument, the univariate data set of interest.

The most used plotting function in R programming is the plot() function. It is a generic function, meaning, it has many methods which are called according to the type of object passed to plot().

In the simplest case, we can pass in a vector and we will get a scatter plot of magnitude vs index. But generally, we pass in two vectors and a scatter plot of these points are plotted.

Adding Titles and Labeling Axes

We can add a title to our plot with the parameter “main”. Similarly, “xlab” and “ylab” can be used to label the x-axis and y-axis respectively.

Changing Color and Plot Type

The default color of the plot is black. We can change the plot type with the argument “type”. 

Overlaying Plots Using legend() function

Calling plot() multiple times will have the effect of plotting the current graph on the same window replacing the previous one. However, sometimes we wish to overlay the plots in order to compare the results. This is made possible with the functions lines()  and points() to add lines and points respectively, to the existing plot.

Plotting with ggplot2

ggplot2 is a plotting package that makes it simple to create complex plots from data in a data frame. It provides a more programmatic interface for specifying what variables to plot, how they are displayed, and general visual properties. Therefore, we only need minimal changes if the underlying data change or if we decide to change from a bar plot to a scatterplot. This helps in creating publication quality plots with minimal amounts of adjustments and tweaking.

ggplot2 functions like data in the ‘long’ format, such as a column for every dimension, and a row for every observation. Well-structured data will save us lots of time when making figures with ggplot2.

ggplot graphics are built step by step by adding new elements. Adding layers in this fashion allows for extensive flexibility and customization of plots.

While it’s relatively easy to create standard plots in R, if we need to make a custom plot, things can get hairy fast. That’s why ggplot2 was born to make building custom plots easier.

In the words of its creator, ggplot2 “takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.”

ggplot2 is based on The Grammar of Graphics, a system for understanding graphics as composed of certain layers that together create a complete plot. With ggplot2, we can, for instance, start building our plot with axes, then add points, then a line, a confidence interval, and so on.

Write A Comment