How to Remove Duplicates from DataFrame in Julia?

How to Remove Duplicates from DataFrame in Julia?

  • Julia
  • 1 min read

To remove duplicates from a dataframe in Julia, you can use the unique function from the DataFrames package. Here is an example of how to use it:

Remove Duplicates from DataFrame in Julia Example

using DataFrames

# create a sample dataframe
df = DataFrame(x = [1, 2, 2, 3, 3, 3], y = [4, 5, 5, 6, 6, 6])
6×2 DataFrame
 Row │ x      y     
     │ Int64  Int64 
   1 │     1      4
   2 │     2      5
   3 │     2      5
   4 │     3      6
   5 │     3      6
   6 │     3      6
# remove duplicates
df_unique = unique(df)
3×2 DataFrame
 Row │ x      y     
     │ Int64  Int64 
   1 │     1      4
   2 │     2      5
   3 │     3      6
# display the resulting dataframe
3×2 DataFrame
 Row │ x      y     
     │ Int64  Int64 
   1 │     1      4
   2 │     2      5
   3 │     3      6

This will create a new dataframe df_unique that only contains the unique rows of df. Note that the unique function only considers the values of the columns when determining uniqueness. If you want to consider a subset of the columns, you can pass them as an argument to the unique function like this:

df_unique = unique(df, [:x])

This will remove duplicates based on the values in the x column only.

You can also use the unique! function to remove duplicates in place, modifying the original dataframe.



  1. Convert Array to DataFrame in Julia
  2. How to Create DataFrame in Julia?