Bioinformatics Bioinformatics Scripting

Data Structures in R

Pinterest LinkedIn Tumblr

A data structure is a special way of organizing data in a computer so that it can be used effectively. The idea is to decreases the space and time complexities of different tasks. Data structures in R programming are tools for holding multiple values. R’s base data structures are often organized by their dimensionality (1D, 2D, or nD) and whether they’re homogeneous or heterogeneous.

Homogeneous: all elements must be of the identical type

Heterogeneous: the elements are often of different types

R has many data structures. These are;

  • atomic vector
  • list
  • matrix
  • data frame
  • factors

Vectors

A vector is the most common and basic data structure in R and is pretty much the workhorse of R. Technically, vectors can be one of two types:

  • atomic vectors
  • lists

Although the term “vector” most commonly maens the atomic types not to lists.

The Different Vector Modes

A vector is a collection of elements that are most commonly of mode 

  • Character
  • Logical
  • Integer or numeric

Matrix

In R matrices are an extension of the numeric or character vectors. They are not a separate type of object but simply an atomic vector with dimensions; the number of rows and columns. As with atomic vectors, the elements of a matrix must be of the same data type.

List

In R lists act as containers. Unlike atomic vectors, the contents of a list are not restricted to a single mode and can encompass any mixture of data types. Lists are sometimes called generic vectors, because the elements of a list can be of any type of R object, even lists containing further lists. This property makes them basically different from atomic vectors.

Lists can be extremely useful inside functions. Because the functions in R are able to return only a single object, we can “staple” together lots of different kinds of results into a single object that a function can return. A list does not print to the console like a vector. Instead, each element of the list starts on a new line.

Data Frame

A data frame is a very important data type in R. It’s pretty much the de facto data structure for most tabular data and what we use for statistics. A data frame is a special type of list where every element of the list has same length (i.e. data frame is a “rectangular” list).

Some additional information about data frames:

  • Usually created by read.csv() and read.table(), i.e. when importing the data into R.
  • Assuming all columns in a data frame are of same type, data frame can be converted to a matrix with data.matrix() (preferred) or as.matrix(). Otherwise type coercion will be enforced and the results may not always be what you expect.
  • Can also create a new data frame with data.frame() function.
  • Find the number of rows and columns with nrow(dat) and ncol(dat), respectively.
  • Row names are often automatically generated

Factors

One important use of attributes is to define factors. A factor is a vector that can contain only predefined values, and is used to store categorical data. Factors are built on top of integer vectors using two attributes: the class, “factor”, which makes them, behaves differently from regular integer vectors, and the levels, which defines the set of allowed values.

Write A Comment