Data Frames in R

Data Frames in R

  • R
  • 4 mins read

In R, data frames are used to store and manipulate tabular data into a single variable. Data frames offer a number of advantages over traditional column-by-column matrix storage. This article will cover the data structure of data frames—that is, what are data frames, and how they can be created and used in the R programming language.

Introduction to Data Frames in R

A data frame is a tabular representation of data elements, arranged in a well-defined fixed format. It is similar to the matrix data structure, but each column in the data frame can belong to a different data type. A data frame can belong to any of the predefined data types, integer, character, or logical data type. The data frame in R programming has the following attributes : 

  1. There are a fixed number of rows and columns in the data frame. 
  2. The names of columns in a data frame cannot be empty. 
  3. The names of rows cannot be repetitive in nature. Each name should be unique. 
  4. Every column contains the same number of data items. 

R provides us with a large number of functions to create a data frame as well as access the various attributes associated with it. 

Creating a Data Frame in R

A data frame in R can be created using the data.frame() method, which takes as arguments the column names and their data. Each column name is associated with a vector of values, wherein, each column may or may not belong to the same data type. The number of entries in each column should be the same. 

#creating a data frame 
data_frame = data.frame(col1 = c(1:5),
                        col2 = c("Amma","baba","cathy","daddy","emma"),
                        col3 = c(T,F,T,T,F))
#printing the data frame
print("Data Frame")
print(data_frame)
The code produces the following output : 
[1] "Data Frame"
  col1  col2  col3
1    1  Amma  TRUE
2    2  baba FALSE
3    3 cathy  TRUE
4    4 daddy  TRUE
5    5  emma FALSE

Structure of the Data Frame

The structure of the data frame is a complete illustration of the data frame object and the elements, that is the rows and columns making it up. The in-built str() method in R is used to retrieve the internal structure of the data frame. The str() method simply takes as a parameter the data frame.

#creating a data frame 
data_frame = data.frame(col1 = c(1:5),
                        col2 = c("Amma","baba","cathy","daddy","emma"),
                        col3 = c(T,F,T,T,F))
#printing the data frame
print("Data Frame")
print(data_frame)

#printing the structure of the data frame
print("DataFrame Structure")
str(data_frame)
Output :
[1] "Data Frame"
  col1  col2  col3
1    1  Amma  TRUE
2    2  baba FALSE
3    3 cathy  TRUE
4    4 daddy  TRUE
5    5  emma FALSE
'data.frame':	5 obs. of  3 variables:
 $ col1: int  1 2 3 4 5
 $ col2: chr  "Amma" "baba" "cathy" "daddy" ...
 $ col3: logi  TRUE FALSE TRUE TRUE FALSE

Summary of the Data Frame in R

The summary() the method in R is used to generate the summary statistics for the columns individually. In case, the data type of the column is numeric in nature, then the summary returns the values for the mean, median, and mode as well as the min and max for the data frame. For character type columns, the summary method returns the attributes like mode or length respectively. 

#creating a data frame 
data_frame = data.frame(col1 = c(1:5),
                        col2 = c("Amma","baba","cathy","daddy","emma"),
                        col3 = c(T,F,T,T,F))


#printing the summary of the data frame
print("DataFrame Summary")
summary(data_frame)
Output
[1] "DataFrame Summary"
      col1       col2              col3        
 Min.   :1   Length:5           Mode :logical  
 1st Qu.:2   Class :character   FALSE:2        
 Median :3   Mode  :character   TRUE :3        
 Mean   :3                                     
 3rd Qu.:4                                     
 Max.   :5