- GA4 BigQuery For Marketing Analytics
- Posts
- Two GA4 User Tables in BigQuery Demystified
Two GA4 User Tables in BigQuery Demystified
Learn BigQuery with GA4
When you export GA4 data to BigQuery, you will see one or two user tables, depending on your GA4 implementation. These tables are
Pseudonymous Users table - This table contains information about anonymous users. All users on your site or app are anonymous by default. The naming format of this table is pseudonymous_users_YYYYMMDD. Each day of data creates a new table. These tables are listed as pseudonymous_users_(##) and just like other tables the name of this table also contains the date.
Key things about this table:
It contains a row for every pseudonymous identifier.
This table is updated when there is a change to data in any one of the fields.
Users who have not given consent, are not exported to this table.
Known User IDs are not exported to this table (see the User ID table below for this information)
The last active timestamp is exported to this table.
Users - This table contains information about known users. This table will show up if you enable User ID integration with GA4. The naming format of this table is users_YYYYMMDD and is listed under users_(##).
Key things about this table:
It contains a row for every user ID. D
This table is updated when there is a change to data in any one of the fields.
Users who have not given consent can be exported to this table if they include a user ID.
Pseudouser IDs are not exported to this table, they are contained in the Pseudonymous Users table
The last active timestamp is exported to this table.
Data Exported in Pseudonymous Users and User Tables
The daily export in the above tables includes any new users and all the users whose data has changed that day. If you need some examples to see which users will be included in the daily export then check the “Advanced Notes” section below.
Tables Schema
The schema for both tables is very similar, with just a minor difference.
pseudonymous_users table
user table
The data in these tables can be grouped into a few major buckets, as listed below:
User ID
User Info
Privacy Info
User Properties
Device
Geo
Audiences
User Lifetime
Predictions
Audit
Let’s take a look at what type of data is contained in the major buckets.
User ID
User ID information is stored in three different columns.
user_id, STRING (date type)
ID for the User-ID namespace in reporting identity (only available in the User table)
pseudo_user_id, STRING
ID for the Pseudonymous namespace (only available in the Pseudonymous Users table only)
stream_id, INTEGER
Data-stream ID (only available in the Pseudonymous Users table)
User info
This data contains some important timelines of the user and is stored in the “user_info” column as a record data type. This type of column is called a record type column (or record data type) since it contains a record of information instead of just a single value. I will cover dealing with “records” in future posts. This column contains a record with the following information:
user_info.last_active_timestamp_micros, INTEGER
Contains the date of the user's last activity, stored as timestamp in microseconds.
user_info.user_first_touch_timestamp_micros, INTEGER
Contains the date of the user's first_open or first_visit event, whichever is earlier (timestamp in microseconds)
user_info.first_purchase_date, STRING
Contains the date of the user's first purchase stored in YYYYMMDD format.
Privacy info
The privacy setting of the user is stored in the “privcacy_info” column. This is also of type record and contains the following information:
privacy_info.is_ads_personalization_allowed, STRING
If a user is eligible for ads personalization, this value is set to 'true'. If a user is not eligible for ads personalization, this value is set to 'false'. isAdsPersonalizationAllowed returns '(not set)' if Google Analytics is not currently able to return whether this user is eligible for ads personalization.
privacy_info.is_limited_ad_tracking, STRING
The device's Limit Ad Tracking setting. Possible values include: 'true', 'false', and '(not set)'. isLimitedAdTracking returns '(not set)' if Google Analytics is not currently able to return this device's Limit Ad Tracking setting.
Audiences
This contains information about the audiences that the user belongs to. The data is stored in the “audiences” column. Since the user can belong to multiple GA4 audiences, each of them is listed in this column. This is also of type record and contains the following information:
audiences.id, INTEGER
This is an internal ID that GA assigns to each “Audience” that you create in the interface.
audiences.name, STRING
The name of the audience that you assign in GA4
audiences.membership_start_timestamp_microsINTEGER
This field contains the timestamp when the user was first included in the audience.
audiences.membership_expiry_timestamp_micros, INTEGER
The timestamp when the user's audience membership will expire. Membership duration is reset when a new activity requalifies the user for the audience
audience.npa, BOOLEAN
This field stores true or false based on your NPA settings for events and user-scoped custom dimensions included in your audience definition (This is a GA4 topic that will covered as required in this series).
User Properties
The data is stored in the “user_properties” column of the type record. It contains the following information.
user_properties.key, STRING
This contains the key of the “user property” (dimension) that you send in GA4. The value of this key is stored in the next column.
user_properties.value.string_value, STRING
This contains the value of the user-property dimension
user_properties.value.set_timestamp_micros, INTEGER
This field contains the timestamp when the value was last set.
Device
The information about the device the user used to visit your site or app is stored in the “devices” column. This data is also of type record and contains the following information:
device.operating_system, STRING
the operating system of the device used by the user.
device.category, STRING
The category of the device the user. The possible values are mobile, tablet, and desktop.
device.mobile_brand_name, STRING
The brand name of the user’s device.
device.mobile_model_name, STRING
The model name of the user’s device
device.unified_screen_name, STRING
The screen name of the user’s device
Geo
This field, “geo”, contains all the information about the user’s geolocation. This field is also of type record and stores the following information
geo.city, STRING
the city from which events were reported
geo.country, STRING
Contains the country from which events were reported
geo.continent, STRING
the continent from which events were reported
geo.region, STRING
the region from which events were reported
Lifetime
The column called “user_ltv” contains the user’s lifetime value data. This field is also of type record and store the following information:
user_ltv.revenue_in_usd, DOUBLE
The lifetime total revenue generated from this user (in USD)
user_ltv.sessions, INTEGER
Lifetime total number of sessions
user_ltv.engagement_time_millis, INTEGER
Lifetime total engagement time (in milliseconds)
user_ltv.purchases, INTEGER
Total number of purchases by this user
user_ltv.engaged_sessions, INTEGER
Lifetime total number of engaged sessions
user_ltv.session_duration_microsINTEGERLifetime total session duration (in milliseconds)
Predictions
The column “predictions” contains the values based on machine learning (ML) predictions about Google’s model. It is of type record and contains the following values.
predictions.in_app_purchase_score_7d, DOUBLE
The probability that a user who was active in the last 28 days will purchase (log an in_app_purchase event) within the next 7 days
predictions.purchase_score_7d, DOUBLE
The probability that a user who was active in the last 28 days will purchase (log a purchase event) within the next 7 days
predictions.churn_score_7d, DOUBLE
Contains the probability that a user who was active on your app or site within the last 7 days will not be active within the next 7 days.
predictions.revenue_28d_in_usd, FLOAT
Contains expected revenue, in USD, from all purchase events within the next 28 days from a user who was active in the last 28 days
Audit
At the very end of the table, there are two separate columns containing the dates of the changes in the record. These columns are used for auditing purposes.
occurrence_date, STRING
Date when the record change was triggered
last_updated_date, STRING
Date when the record was updated in the table
Record Type Data in BigQuery Import of GA4
Many of the fields, as mentioned above, are Record type fields. This field contains the entire record in one column. Think of it as a table (record) within a table.
Each record has a key value containing the name of the value (e.g. event_params.key) and the value containing the actual value.
Actual values are contained in a column that matches the data type of the value. There are four possible data types, String, Integer, Float, or Double, and hence you will see four columns but only one of the columns contains the value while the other columns are null.
For example, for the event parameters, there is a column called, event_params.key. Let’s say this key value is session_number. This means the value column corresponding to this event parameter will contain the session number of the user, which is an integer value. So event_params.value.int_value column is populated with the value while other columns are empty.
Don’t worry if this concept is a little difficult to understand. It will become clear in future posts. However, if you want to speed up your learning then join my BigQuery for Marketing
If you have any questions then do not hesitate to reach out to me. I will start answering some of the questions in future editions.
Help me promote this newsletter, invite your friends and colleagues, and earn rewards.
Join the conversation