Data Structures in R

Data Structures in R

  • R
  • 4 mins read

In this article, we will study the various available data structures in R and when they are used. 

Data structures are collections of elements stored together in one single place. The collection may contain the same or different types of elements. Each data structure is associated with its own set of attributes, defined using in-built methods. 

The major data structures in R are as follows : 

Vectors in R

A vector is a homogeneous collection of elements stored together sequentially. Vectors are also known as uni-dimensional arrays. Each vector is associated with a length attribute, which specifies the number of elements contained within it. An empty vector is associated with a length equivalent to 0. Vectors can contain elements belonging to any data type, be it an integer, logical, or string type in nature. 

#declaring vectors
vec1 = c(2,5,-1,8,10)
vec2 = c("Hi","There")
#printing the contents of vec1
cat("Vec1", vec1)
#printing the length of vec1
cat("length of vec1 ", length(vec1))
The code produces the following output : 
Vec1 2 5 -1 8 10
length of vec1  5

Lists in R

A list is a heterogeneous collection of elements stored together. A list may contain matrices, other lists or vectors, or even singular elements. Lists are also known as generic vectors, because of the variability of the data types in the elements stored within them. The list() method is used to create a list in R.

#declaring list 
list_obj <- list("a",c(1:4),4+3i,list(TRUE,-10))

#printing the contents of the list
print("list Contents")
The code produces the following output : 
[1] "list Contents"
[1] "a"

[1] 1 2 3 4

[1] 4+3i

[1] TRUE

[1] -10

Data Frames in R

A data frame is a collection of heterogeneous elements stored together in tabular form. The elements are arranged in rows and columns. Data frames are two-dimensional in nature and can be declared using the data.frame() method in R. Every row in the data frame must have the same number of elements. The elements should also have the same data type. The syntax of this method is : 

data.frame(col1 , … coln )

where col1.. coln: a vector of values of the same data type

#declaring a data frame
#col2 containing string values
data_frame <- data.frame(col1 = c(1:3),  #col1 containing numerical values
                         col2 = c("Hi","Readers","This is about DS")) 

print("Data Frame")
The code produces the following output : 
[1] "Data Frame"
  col1             col2
1    1               Hi
2    2          Readers
3    3 This is about DS


A matrix is an ordered collection of homogeneous elements arranged together in the form of rows and columns. It may be square or rectangular in nature. Matrices are two-dimensional R objects, created by the matrix() method. The elements in the matrix are stored in column-wise order. At least the number of rows or columns have to be specified in the matrix method. The method has the following syntax : 

Matrix (seq , nrow = , ncol = )

Where nrow is the number of rows and ncol is the number of columns

#declaring a matrix
mat <- matrix(c(1:12), ncol = 4) 

The code produces the following output : 
[1] "Matrix"
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12


Arrays are n-dimensional R objects containing homogeneous elements. The declaration of the array is made using the array() method which takes as input a vector of elements that is arranged according to the specified dimensions. The syntax is : 

array( vec , dim = (rol , col , num)) , where dim specifies the num of arrays each having row x col dimensions.
#declaring a array 

arr <- array(c(1:20) , 
             dim = c(5,2,2))
#printing the array elements 
print("Array Contents : ")
The code produces the following output :
[1] "Array Contents : "
, , 1

     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

, , 2

     [,1] [,2]
[1,]   11   16
[2,]   12   17
[3,]   13   18
[4,]   14   19
[5,]   15   20


The factors are a vector of values wherein each unique value is aligned with a level. The number of levels in a factor corresponds to the number of unique values within it. Factors are used mostly in the machine learning domain. 

#declaring a factor 
fac <- factor(c("R","Python", "C++","Python", "C++","R","R","C++"))
print("Unique Levels of factor")
The output produced by the code is as follows : 
[1] "Unique Levels of factor"
[1] R      Python C++    Python C++    R      R      C++   
Levels: C++ Python R