How to Remove Duplicates from DataFrame in Julia?

How to Remove Duplicates from DataFrame in Julia?

  • Julia
  • 1 min read

To remove duplicates from a dataframe in Julia, you can use the unique function from the DataFrames package. Here is an example of how to use it:

Remove Duplicates from DataFrame in Julia Example

using DataFrames

# create a sample dataframe
df = DataFrame(x = [1, 2, 2, 3, 3, 3], y = [4, 5, 5, 6, 6, 6])
6×2 DataFrame
 Row │ x      y     
     │ Int64  Int64 
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     2      5
   4 │     3      6
   5 │     3      6
   6 │     3      6
# remove duplicates
df_unique = unique(df)
3×2 DataFrame
 Row │ x      y     
     │ Int64  Int64 
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     3      6
# display the resulting dataframe
df_unique
3×2 DataFrame
 Row │ x      y     
     │ Int64  Int64 
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     3      6

This will create a new dataframe df_unique that only contains the unique rows of df. Note that the unique function only considers the values of the columns when determining uniqueness. If you want to consider a subset of the columns, you can pass them as an argument to the unique function like this:

df_unique = unique(df, [:x])

This will remove duplicates based on the values in the x column only.

You can also use the unique! function to remove duplicates in place, modifying the original dataframe.

unique!(df)

Related:

  1. Convert Array to DataFrame in Julia
  2. How to Create DataFrame in Julia?