Operations on DataFrames in R

Operations on DataFrames in R

  • R
  • 3 mins read

In this article, we will discuss some key operations that can be performed over the data frame data structure in R.

DataFrames are tabular structures of organized data arranged in the form of rows and columns. The intersection of rows and columns forms cells. Data frames can be subjected to different types of transformations, such as shifting, adding, or removing elements. The data present within the cells can also be changed. 

Subsetting the dataframe in R

The subset() the method in R is used to retrieve a smaller subset of the existing data frame on the basis of the specified condition. The condition is the logical expression, for which the cell values of the data frame are evaluated, and returned as a result, if the value evaluates to true, otherwise, false. The method has the following syntax : 

subset( df, cond , select)

Arguments : 

  • df - The data frame to use for subsetting 
  • cond - The condition to be evaluated

select - The columns of the data frame to be selected. If multiple columns are to be selected, then the c() the method is used along with the specification of columns to be extracted. Below are some examples of operations on DataFrames in R:

Operations on DataFrames in R Examples

#creating a data frame 
data_frame = data.frame(id = c(1:5),
                        names = c("Amma","baba","cathy","daddy","emma"),
                        marks = c(23,54,32,44,52))


#print data frame
print("Data Frame")
print(data_frame)

#subsetting the data frame with marks >35
subset_df = subset(data_frame,marks>35,select = c(id,names))
print("Subset Data Frame")
print(subset_df)
Output
[1] "Data Frame"
  id names marks
1  1  Amma    23
2  2  baba    54
3  3 cathy    32
4  4 daddy    44
5  5  emma    52
[1] "Subset Data Frame"
  id names
2  2  baba
4  4 daddy
5  5  emma

Multiple conditions can also be combined together using logical operators, such as & or |. In case, the & operator is specified all the conditions are checked for a true value and then the row is returned in the output. For instance, in the following example, the age of the student should be less than 15 and the marks should be greater than 35 respectively. The attributes “id”, and “marks” of the rows satisfying both conditions are displayed with id 5. 

#creating a data frame 
data_frame = data.frame(id = c(1:5),
                        names = c("Amma","baba","cathy","daddy","emma"),
                        marks = c(23,54,32,44,52),
                        age = c(12,16,12,16,13))


#print data frame
print("Data Frame")
print(data_frame)

#subsetting the data frame with marks >35 and age <15
subset_df = subset(data_frame,marks>35 & age<15,select = c(id,names))
print("Subset Data Frame")
print(subset_df)
Output
[1] "Data Frame"
  id names marks age
1  1  Amma    23  12
2  2  baba    54  16
3  3 cathy    32  12
4  4 daddy    44  16
5  5  emma    52  13
[1] "Subset Data Frame"
  id names
5  5  emma

Related Tutorials: