A data structure is a special way of organizing data in a computer so that it can be used effectively. The idea is to decreases the space and time complexities of different tasks. Data structures in R programming are tools for holding multiple values. R’s base data structures are often organized by their dimensionality (1D, 2D, or nD) and whether they’re homogeneous or heterogeneous.
Homogeneous: all elements must be of the identical type
Heterogeneous: the elements are often of different types
R has many data structures. These are;
- atomic vector
- data frame
A vector is the most common and basic data structure in R and is pretty much the workhorse of R. Technically, vectors can be one of two types:
- atomic vectors
Although the term “vector” most commonly maens the atomic types not to lists.
The Different Vector Modes
A vector is a collection of elements that are most commonly of mode
- Integer or numeric
In R matrices are an extension of the numeric or character vectors. They are not a separate type of object but simply an atomic vector with dimensions; the number of rows and columns. As with atomic vectors, the elements of a matrix must be of the same data type.
In R lists act as containers. Unlike atomic vectors, the contents of a list are not restricted to a single mode and can encompass any mixture of data types. Lists are sometimes called generic vectors, because the elements of a list can be of any type of R object, even lists containing further lists. This property makes them basically different from atomic vectors.
Lists can be extremely useful inside functions. Because the functions in R are able to return only a single object, we can “staple” together lots of different kinds of results into a single object that a function can return. A list does not print to the console like a vector. Instead, each element of the list starts on a new line.
A data frame is a very important data type in R. It’s pretty much the de facto data structure for most tabular data and what we use for statistics. A data frame is a special type of list where every element of the list has same length (i.e. data frame is a “rectangular” list).
Some additional information about data frames:
- Usually created by read.csv() and read.table(), i.e. when importing the data into R.
- Assuming all columns in a data frame are of same type, data frame can be converted to a matrix with data.matrix() (preferred) or as.matrix(). Otherwise type coercion will be enforced and the results may not always be what you expect.
- Can also create a new data frame with data.frame() function.
- Find the number of rows and columns with nrow(dat) and ncol(dat), respectively.
- Row names are often automatically generated
One important use of attributes is to define factors. A factor is a vector that can contain only predefined values, and is used to store categorical data. Factors are built on top of integer vectors using two attributes: the
class, “factor”, which makes them, behaves differently from regular integer vectors, and the
levels, which defines the set of allowed values.