Using Inner Join on DataFrames in Julia

Using Inner Join on DataFrames in Julia

  • Julia
  • 2 mins read

To perform an inner join of two dataframes in Julia, you can use the innerjoin function from the DataFrames package.

An inner join returns only the rows that have matching values in both dataframes. The resulting dataframe will contain only the columns that exist in both dataframes.

Inner Join on DataFrames in Julia Example

Here is the syntax of how to perform an inner join of two dataframes df1 and df2 on the column "id":

using DataFrames

# Inner join df1 and df2 on the "id" column
df_inner = innerjoin(df1, df2, on = :id)

Here is a complete example of how to use innerjoin with sample data:

using DataFrames

# Create two dataframes
df1 = DataFrame(id = [1, 2, 3, 4], name = ["Alice", "Bob", "Charlie", "Dave"])
df2 = DataFrame(id = [2, 3, 4, 5], age = [30, 40, 50, 60])

# Inner join df1 and df2 on the "id" column
df_inner = innerjoin(df1, df2, on = :id)

# Print the resulting dataframe
println(df_inner)

The output of this code will be:

3×3 DataFrame
 Row │ id     name     age   
     │ Int64  String   Int64 
─────┼───────────────────────
   1 │     2  Bob         30
   2 │     3  Charlie     40
   3 │     4  Dave        50

Here is another example of the code that demonstrates how to perform an inner join on three dataframes in Julia using the innerjoin function from the DataFrames package:

Here is an example of how you can do this:

using DataFrames

# Create three dataframes
df1 = DataFrame(id = [1, 2, 3, 4], name = ["Alice", "Bob", "Charlie", "Dave"])
df2 = DataFrame(id = [2, 3, 4, 5], age = [30, 40, 50, 60])
df3 = DataFrame(id = [3, 4, 5, 6], country = ["USA", "Canada", "Mexico", "Brazil"])

# Inner join df1, df2, and df3 on the "id" column
df_inner = innerjoin(df1, df2, df3, on = :id)

# Print the resulting dataframe
println(df_inner)

The output of this code will be:

2×4 DataFrame
 Row │ id     name     age    country 
     │ Int64  String   Int64  String  
─────┼────────────────────────────────
   1 │     3  Charlie     40  USA
   2 │     4  Dave        50  Canada

In this example, the inner join is performed directly on all three dataframes, and the resulting dataframe is stored in the variable df_inner.

Performing the inner join in this way can be more efficient than performing multiple inner joins sequentially, because it avoids the need to create intermediate dataframes. However, it may be more difficult to understand the code if you are not familiar with the data contained in the three dataframes.

Related:

  1. How to Sort DataFrame in Julia?
  2. How to Remove Duplicates from DataFrame in Julia?