Working with Columns and Rows of Data Frames in R

Working with Columns and Rows of Data Frames in R

  • R
  • 5 mins read


When working with data, it’s common to have a set of observations organized into structured groups. These sets of related information are called tables, and in R, they are represented using special data structures called Data Frames. The Data Frame is one of the most commonly used R data structures and works in much the same way as a spreadsheet; there are rows and columns, but instead of numbers or text, each cell can contain a different type of data.

In this blog post, you will learn what Data Frames are, how they work with column names and row indices, how to create them from scratch, load them from different file formats and manage their attributes in order to make your analysis easier.

Accessing individual elements of the data frame

The cells of the data frame can be accessed by specifying the row number and the column number of the element which is to be extracted. The syntax used for the value extraction is as follows: 

Syntax

data-frame[row-indx, col-indx]

Example

#creating a data frame 
data_frame = data.frame(col1 = c(1:5),
                        col2 = c("Amma","baba","cathy","daddy","emma"),
                        col3 = c(T,F,T,T,F),
                        col4 = letters[1:5])


#print data frame
print("Data Frame")
print(data_frame)

#accessing element of second column and third row 
ele = data_frame[3,2]

cat("Element at 2,3 position : ", ele)
Output
[1] "Data Frame"
  col1  col2  col3 col4
1    1  Amma  TRUE    a
2    2  baba FALSE    b
3    3 cathy  TRUE    c
4    4 daddy  TRUE    d
5    5  emma FALSE    e
Element at 2,3 position :  cathy

Accessing entire row data from the data frame 

The entire row data can be extracted from the data frame in the form of a vector. The syntax required to access the entire row is:

Syntax 

data-frame [ row-indx , ]

Where the row-indx is the row to be retrieved. 

Example

#creating a data frame 
data_frame = data.frame(col1 = c(1:5),
                        col2 = c("Amma","baba","cathy","daddy","emma"),
                        col3 = c(T,F,T,T,F),
                        col4 = letters[1:5])


#print data frame
print("Data Frame")
print(data_frame)

#accessing 3rd row data from the data frame
vec = data_frame[3,]
print("Elements of third row")
print(vec)
Output
[1] "Data Frame"
  col1  col2  col3 col4
1    1  Amma  TRUE    a
2    2  baba FALSE    b
3    3 cathy  TRUE    c
4    4 daddy  TRUE    d
5    5  emma FALSE    e
[1] "Elements of third row"
  col1  col2 col3 col4
3    3 cathy TRUE    c

Accessing column data from the data frame

The entire column data can be extracted from the data frame in the form of a vector. The syntax required to access the entire column is: 

Syntax

data-frame [ , col-indx ]

Where the col-indx is the column to be retrieved. 

Example

#creating a data frame 
data_frame = data.frame(col1 = c(1:5),
                        col2 = c("Amma","baba","cathy","daddy","emma"),
                        col3 = c(T,F,T,T,F),
                        col4 = letters[1:5])


#print data frame
print("Data Frame")
print(data_frame)

#accessing 2nd column data from the data frame
vec = data_frame[,2]
print("Elements of second column")
print(vec)
Output
[1] "Data Frame"
  col1  col2  col3 col4
1    1  Amma  TRUE    a
2    2  baba FALSE    b
3    3 cathy  TRUE    c
4    4 daddy  TRUE    d
5    5  emma FALSE    e
[1] "Elements of second column"
[1] "Amma"  "baba"  "cathy" "daddy" "emma" 

Accessing a range of rows and columns from the data frame

We may need to gather a range of columns or rows separated by the colon (:) the operator from the data frame by specifying the rows and columns to be extracted. 

Syntax

data-frame[ st-row-indx:end-row-indx , st-col-indx:end-col-indx]
  • Where, st-row-indx - starting row index
  • end-row-indx - ending row index 
  • st-col-indx - starting column index
  • end-col-indx - ending column index

In this case, a subset of the data frame is accessed and formed by the intersection of rows and columns formed. In case the column indices are empty, then the entire rows for the specified row indices are accessed.  In case the row indices are empty, then the entire columns for the specified column indices are accessed. 

Example

#creating a data frame 
data_frame = data.frame(col1 = c(1:5),
                        col2 = c("Amma","baba","cathy","daddy","emma"),
                        col3 = c(T,F,T,T,F),
                        col4 = letters[1:5])


#print data frame
print("Data Frame")
print(data_frame)

#accessing 2nd,3rd and 4th row data from the data frame
vec = data_frame[2:4,]
print("Elements of second to fourth row")
print(vec)

#accessing data from first to second column in the data frame
vec2 = data_frame[,1:2]
print("Elements of first to second column")
print(vec2)

#accessing data from intersection of 4th and 5th row and 2nd and 3rd column
vec3 = data_frame[4:5,2:3]
print("Elements from intersection of 4th and 5th row and 2nd and 3rd column")
print(vec3)
Output
[1] "Data Frame"
  col1  col2  col3 col4
1    1  Amma  TRUE    a
2    2  baba FALSE    b
3    3 cathy  TRUE    c
4    4 daddy  TRUE    d
5    5  emma FALSE    e
[1] "Elements of second to fourth row"
  col1  col2  col3 col4
2    2  baba FALSE    b
3    3 cathy  TRUE    c
4    4 daddy  TRUE    d
[1] "Elements of first to second column"
  col1  col2
1    1  Amma
2    2  baba
3    3 cathy
4    4 daddy
5    5  emma
[1] "Elements from intersection of 4th and 5th row and 2nd and 3rd column"
   col2  col3
4 daddy  TRUE
5  emma FALSE

Extracting the number of rows and columns of the data frame

The number of rows and columns of the data frame can be accessed using the nrow() and the ncol() methods of the data frame respectively. The columns can also be extracted using the length() method in R. All these methods just take the data frame name as the function parameter.

Example

#creating a data frame 
data_frame = data.frame(col1 = c(1:5),
                        col2 = c("Amma","baba","cathy","daddy","emma"),
                        col3 = c(T,F,T,T,F),
                        col4 = letters[1:5])


#print data frame
print("Data Frame")
print(data_frame)

cat("Number of rows : ",nrow(data_frame))
cat("Number of columns : ",ncol(data_frame))
Output
[1] "Data Frame"
  col1  col2  col3 col4
1    1  Amma  TRUE    a
2    2  baba FALSE    b
3    3 cathy  TRUE    c
4    4 daddy  TRUE    d
5    5  emma FALSE    e
Number of rows :  5> cat("Number of columns : ",ncol(data_frame))
Number of columns :  4

The dim() the method can be used to retrieve both the rows and columns of the data frame collectively.