Using Left Join on DataFrames in Julia

Using Left Join on DataFrames in Julia

  • Julia
  • 2 mins read

In Julia, you can use the leftjoin function to perform a left join on two dataframes. Here's an example of how to use leftjoin to perform a left join:

Left Join on DataFrames in Julia Examples

using DataFrames

# Define the two dataframes to join
df1 = DataFrame(id=[1, 2, 3], name=["Alice", "Bob", "Charlie"])
df2 = DataFrame(id=[1, 2, 3, 4], score=[90, 80, 70, 60])

# Perform the left join
result = leftjoin(df1, df2, on=:id)

# Print the resulting dataframe
println(result)

This will output the following dataframe:

3×3 DataFrame
 Row │ id     name     score  
     │ Int64  String   Int64? 
─────┼────────────────────────
   1 │     1  Alice        90
   2 │     2  Bob          80
   3 │     3  Charlie      70

Note that the score column in the resulting dataframe has type Int64?, which indicates that it is nullable. This is because the score column in the df2 dataframe may contain null values, and the leftjoin function will preserve those null values in the resulting dataframe.

You can also specify the makeunique keyword argument to ensure that the resulting dataframe has unique rows. For example:

result = leftjoin(df1, df2, on=:id, makeunique=true)

This will remove any duplicate rows from the resulting dataframe.

Here's another example of using the leftjoin function to perform a left join on two dataframes with multiple join keys and multiple columns:

using DataFrames

# Define the two dataframes to join
df1 = DataFrame(id=[1, 2, 3], name=["Alice", "Bob", "Charlie"], city=["New York", "Chicago", "Los Angeles"], year=[2020, 2021, 2020])
df2 = DataFrame(id=[1, 2, 3, 4], score=[90, 80, 70, 60], year=[2020, 2021, 2020, 2021])

# Perform the left join
result = leftjoin(df1, df2, on=[:id, :year])

# Print the resulting dataframe
println(result)

This will output the following dataframe:

3×5 DataFrame
 Row │ id     name     city         year   score  
     │ Int64  String   String       Int64  Int64? 
─────┼────────────────────────────────────────────
   1 │     1  Alice    New York      2020      90
   2 │     2  Bob      Chicago       2021      80
   3 │     3  Charlie  Los Angeles   2020      70

Alternatively, you can modify the code like this to remove the :year column from the on argument in the leftjoin function:

using DataFrames

# Define the two dataframes to join
df1 = DataFrame(id=[1, 2, 3], name=["Alice", "Bob", "Charlie"], city=["New York", "Chicago", "Los Angeles"])
df2 = DataFrame(id=[1, 2, 3, 4], score=[90, 80, 70, 60], year=[2020, 2021, 2020, 2021])

# Perform the left join
result = leftjoin(df1, df2, on=:id)

# Print the resulting dataframe
println(result)

This will output the following dataframe:

3×5 DataFrame
 Row │ id     name     city         score   year   
     │ Int64  String   String       Int64?  Int64? 
─────┼─────────────────────────────────────────────
   1 │     1  Alice    New York         90    2020
   2 │     2  Bob      Chicago          80    2021
   3 │     3  Charlie  Los Angeles      70    2020

Related: