To remove duplicates from a dataframe in Julia, you can use the unique
function from the DataFrames
package. Here is an example of how to use it:
Remove Duplicates from DataFrame in Julia Example
using DataFrames # create a sample dataframe df = DataFrame(x = [1, 2, 2, 3, 3, 3], y = [4, 5, 5, 6, 6, 6]) 6×2 DataFrame Row │ x y │ Int64 Int64 ─────┼────────────── 1 │ 1 4 2 │ 2 5 3 │ 2 5 4 │ 3 6 5 │ 3 6 6 │ 3 6 # remove duplicates df_unique = unique(df) 3×2 DataFrame Row │ x y │ Int64 Int64 ─────┼────────────── 1 │ 1 4 2 │ 2 5 3 │ 3 6 # display the resulting dataframe df_unique 3×2 DataFrame Row │ x y │ Int64 Int64 ─────┼────────────── 1 │ 1 4 2 │ 2 5 3 │ 3 6
This will create a new dataframe df_unique
that only contains the unique rows of df
. Note that the unique
function only considers the values of the columns when determining uniqueness. If you want to consider a subset of the columns, you can pass them as an argument to the unique
function like this:
df_unique = unique(df, [:x])
This will remove duplicates based on the values in the x
column only.
You can also use the unique!
function to remove duplicates in place, modifying the original dataframe.
unique!(df)