Selecting Data from Multiple Tables

Learn how to do a SQL Join

Anil Batra
December 01, 2023

As you already know, GA4 creates multiple tables in BigQuery. We have been focused on the Events table so far.

However, GA4 also creates two more tables along with the events table.

Additionally, as you dive into BigQuery and explore its complete capabilities, you might bring in data from other data sources such as CRM, Email automation tools, etc.

As a result, you will often need to combine the data from various tables to get the right data set.

In this post, I will show you how to achieve that.

Note: If you are already familiar with table joins, you can skip this lesson.

Since we haven’t yet looked at other tables or brought in other data sources, most of this post is going to be theoretical but will revisit it as needed in the future using GA4

Let’s assume you have two tables in BigQuery with the following structure

Now if we want the result set that includes that data from the “Customer” table combined with the data from the “State” table, so that we can know the actual name of the state along with customer information then we will need to “join” the two tables.

To join the two tables, you have to first identify a column in both tables that you will use to join them. In our case, that seems to be the “StateCode” column in both tables.

Now I am going to explain the SQL and then show you the result that you will get.

Here is the SQL:

Select * from Customer
Join State
On Customer.StateCode = State.StateCode

In the above SQL, we are asking BigQuery to return the result from the “Customer” table, by joining it to the “State” table, followed by the “On” SQL command, followed by the names of the columns used to join the two tables.

Note:
SQL uses dot notation, just like other programming languages to identify a child object. In this example, the table is the parent, and the column name is the child object. So, “Customer.StateCode” means the “StateCode” column from the “Customer” table, and “State.StateCode” means the “StateCode” column from the “State” table.

So the above SQL will find the matching records from both the tables by matching (joining) them on the “StateCode” columns in both the tables and return the following result set.

The first three columns are from the “Customer” table and the last two are from the “State” table. As you will see above, the “StateCode” column from the “Customer” table matches the “StateCode” column from the “State” table.

Now, let’s take a look at another scenario.

Let’s say one of the tables only contains some matching data but not all. For example, in the screen below, the “State “table now only contains one row instead of the three we saw above.

Now, when the same SQL will return the following result set because it only finds matching records from both tables. Since “WA” is the only value of the “StateCode” column that is common in both tables, it will only return the data where it finds “WA” in both tables.

Advanced Notes

Subscribe to Premium Membership to read the rest.

Become a paying subscriber of Premium Membership to get access to this post and other subscriber-only content.

Upgrade

Already a paying subscriber? Sign In

A subscription gets you:

Advanced Content

Readymade SQL Scripts

Subscribe to keep reading

This content is free, but you must be subscribed to GA4 BigQuery For Marketing Analytics to continue reading.

Already a subscriber?Sign In.Not now

Join the conversation

or to participate.