Introduction to CTEs in BigQuery with GA4 Data: A Beginner's Guide

GA4 data with BigQuery

In this post, I will explain Common Table Expressions, also called CTEs.

A Common Table Expression is a temporary result set that you can reference within any SELECT, INSERT, UPDATE, or DELETE SQL statements.

CTE is defined by selecting a subset of data and providing it a name. You can refer to the CTE name in another query, like any other table.

First, let’s take a look at the basic syntax of CTE and then I will explain it.

Basic Syntax of CTE:

WITH
  subQ1 AS (
            SELECT user_pseudo_id
            FROM  
            `your-project.your-dataset.events_######` 
            WHERE event_name = 'first_visit'
            )
SELECT DISTINCT * FROM subQ2;

In this above SQL, CTE is defined using the WITH clause, followed by the name of the CTE. In our case the name of CTE is subQ1. The name is then followed by the AS statement and the SQL statement for CTE in parenthesis.

In the above statement, I am selecting all the values of “user_pseudo_id” for all the records where event_name = ‘first_visit’.

Now once, I have this list, I am simply writing a SQL statement to select distinct values from subQ1 CTE, just like I would do from another table. The resultset will be distinct values of user_pseudo_id’s.

This is a very simple example but this comes in handy when you have to write complex SQL statements, as we will see in future lessons.

Recap:

  • CTE starts with a WITH clause

  • subQ1 is the name that I provided to refer to this result set (CTE). You can provide any name you like.

  • AS statement is required followed by a parenthesis

  • Inside the parenthesis, you write the SQL to pull the data that you need to store in CTE.

  • After that, you can use the CTE just like any other table, using the name you provided.

You can create multiple CTEs in a query by separating them with a comma.

WITH
  first_visit_ids AS (
            SELECT user_pseudo_id
            FROM  
            `your-project.your-dataset.events_######` 
            WHERE event_name = 'first_visit'
            ),
  lead_ids AS (
            SELECT user_pseudo_id
            FROM  
            `your-project.your-dataset.events_######` 
            WHERE event_name = 'generate_lead'
            )
SELECT [rest of your query using the above CTEs]

Note: I cover and example in the Advanced Notes.

Advanced Notes

This section is available for Premium Members. If you are ready to learn to go to the next level then become a Premium Member and support this newsletter.

As always, practice these queries to get a good handle on BigQuery and SQL.

If you have any questions then do not hesitate to reach out to me.

Thank you,

Anil Batra, Optizent.com

Join the conversation

or to participate.