To remove duplicates from a dataframe in Julia, you can use the unique
function from the DataFrames
package. Here is an example of how to use it:
Remove Duplicates from DataFrame in Julia Example
using DataFrames
# create a sample dataframe
df = DataFrame(x = [1, 2, 2, 3, 3, 3], y = [4, 5, 5, 6, 6, 6])
6×2 DataFrame
Row │ x y
│ Int64 Int64
─────┼──────────────
1 │ 1 4
2 │ 2 5
3 │ 2 5
4 │ 3 6
5 │ 3 6
6 │ 3 6
# remove duplicates
df_unique = unique(df)
3×2 DataFrame
Row │ x y
│ Int64 Int64
─────┼──────────────
1 │ 1 4
2 │ 2 5
3 │ 3 6
# display the resulting dataframe
df_unique
3×2 DataFrame
Row │ x y
│ Int64 Int64
─────┼──────────────
1 │ 1 4
2 │ 2 5
3 │ 3 6
This will create a new dataframe df_unique
that only contains the unique rows of df
. Note that the unique
function only considers the values of the columns when determining uniqueness. If you want to consider a subset of the columns, you can pass them as an argument to the unique
function like this:
df_unique = unique(df, [:x])
This will remove duplicates based on the values in the x
column only.
You can also use the unique!
function to remove duplicates in place, modifying the original dataframe.
unique!(df)