Two GA4 User Tables in BigQuery Demystified

Learn BigQuery with GA4

When you export GA4 data to BigQuery, you will see one or two user tables, depending on your GA4 implementation. These tables are

  1. Pseudonymous Users table - This table contains information about anonymous users. All users on your site or app are anonymous by default. The naming format of this table is pseudonymous_users_YYYYMMDD. Each day of data creates a new table. These tables are listed as pseudonymous_users_(##) and just like other tables the name of this table also contains the date.

    Key things about this table:

    • It contains a row for every pseudonymous identifier.

    • This table is updated when there is a change to data in any one of the fields.

    • Users who have not given consent, are not exported to this table.

    • Known User IDs are not exported to this table (see the User ID table below for this information)

    • The last active timestamp is exported to this table.

  2. Users - This table contains information about known users. This table will show up if you enable User ID integration with GA4. The naming format of this table is users_YYYYMMDD and is listed under users_(##).

    Key things about this table:

    • It contains a row for every user ID. D

    • This table is updated when there is a change to data in any one of the fields.

    • Users who have not given consent can be exported to this table if they include a user ID.

    • Pseudouser IDs are not exported to this table, they are contained in the Pseudonymous Users table

    • The last active timestamp is exported to this table.

Data Exported in Pseudonymous Users and User Tables

The daily export in the above tables includes any new users and all the users whose data has changed that day. If you need some examples to see which users will be included in the daily export then check the “Advanced Notes” section below.

Tables Schema

The schema for both tables is very similar, with just a minor difference.

pseudonymous_users table

user table

The data in these tables can be grouped into a few major buckets, as listed below:

  1. User ID

  2. User Info

  3. Privacy Info

  4. User Properties

  5. Device

  6. Geo

  7. Audiences

  8. User Lifetime

  9. Predictions

  10. Audit

Let’s take a look at what type of data is contained in the major buckets.

User ID

User ID information is stored in three different columns.

  • user_id, STRING (date type)

    ID for the User-ID namespace in reporting identity (only available in the User table)

  • pseudo_user_id, STRING

    ID for the Pseudonymous namespace (only available in the Pseudonymous Users table only)

  • stream_id, INTEGER

    Data-stream ID (only available in the Pseudonymous Users table)

User info

This data contains some important timelines of the user and is stored in the “user_info” column as a record data type. This type of column is called a record type column (or record data type) since it contains a record of information instead of just a single value. I will cover dealing with “records” in future posts. This column contains a record with the following information:

  • user_info.last_active_timestamp_micros, INTEGER

    Contains the date of the user's last activity, stored as timestamp in microseconds.

  • user_info.user_first_touch_timestamp_micros, INTEGER

    Contains the date of the user's first_open or first_visit event, whichever is earlier (timestamp in microseconds)

  • user_info.first_purchase_date, STRING

    Contains the date of the user's first purchase stored in YYYYMMDD format.

 Privacy info

The privacy setting of the user is stored in the “privcacy_info” column. This is also of type record and contains the following information:

  • privacy_info.is_ads_personalization_allowed, STRING

    If a user is eligible for ads personalization, this value is set to 'true'. If a user is not eligible for ads personalization, this value is set to 'false'. isAdsPersonalizationAllowed returns '(not set)' if Google Analytics is not currently able to return whether this user is eligible for ads personalization.

  • privacy_info.is_limited_ad_tracking, STRING

    The device's Limit Ad Tracking setting. Possible values include: 'true', 'false', and '(not set)'. isLimitedAdTracking returns '(not set)' if Google Analytics is not currently able to return this device's Limit Ad Tracking setting.

Audiences

This contains information about the audiences that the user belongs to. The data is stored in the “audiences” column. Since the user can belong to multiple GA4 audiences, each of them is listed in this column. This is also of type record and contains the following information:

  • audiences.id, INTEGER

    This is an internal ID that GA assigns to each “Audience” that you create in the interface.

  • audiences.name, STRING

    The name of the audience that you assign in GA4

  • audiences.membership_start_timestamp_microsINTEGER

    This field contains the timestamp when the user was first included in the audience.

  • audiences.membership_expiry_timestamp_micros, INTEGER

    The timestamp when the user's audience membership will expire. Membership duration is reset when a new activity requalifies the user for the audience

  • audience.npa, BOOLEAN

    This field stores true or false based on your NPA settings for events and user-scoped custom dimensions included in your audience definition (This is a GA4 topic that will covered as required in this series).

User Properties

The data is stored in the “user_properties” column of the type record. It contains the following information.

  • user_properties.key, STRING

    This contains the key of the “user property” (dimension) that you send in GA4. The value of this key is stored in the next column.

  • user_properties.value.string_value, STRING

    This contains the value of the user-property dimension

  • user_properties.value.set_timestamp_micros, INTEGER

    This field contains the timestamp when the value was last set.

 Device

The information about the device the user used to visit your site or app is stored in the “devices” column. This data is also of type record and contains the following information:

  • device.operating_system, STRING

    the operating system of the device used by the user.

  • device.category, STRING

    The category of the device the user. The possible values are mobile, tablet, and desktop.

  • device.mobile_brand_name, STRING

    The brand name of the user’s device.

  • device.mobile_model_name, STRING

    The model name of the user’s device

  • device.unified_screen_name, STRING

    The screen name of the user’s device

Geo

This field, “geo”, contains all the information about the user’s geolocation. This field is also of type record and stores the following information

  • geo.city, STRING

    the city from which events were reported

  • geo.country, STRING

    Contains the country from which events were reported

  • geo.continent, STRING

    the continent from which events were reported

  • geo.region, STRING

    the region from which events were reported

 Lifetime

The column called “user_ltv” contains the user’s lifetime value data. This field is also of type record and store the following information:

  • user_ltv.revenue_in_usd, DOUBLE

    The lifetime total revenue generated from this user (in USD)

  • user_ltv.sessions, INTEGER

    Lifetime total number of sessions

  • user_ltv.engagement_time_millis, INTEGER

    Lifetime total engagement time (in milliseconds)

  • user_ltv.purchases, INTEGER

    Total number of purchases by this user

  • user_ltv.engaged_sessions, INTEGER

    Lifetime total number of engaged sessions

  • user_ltv.session_duration_microsINTEGERLifetime total session duration (in milliseconds)

 Predictions

The column “predictions” contains the values based on machine learning (ML) predictions about Google’s model. It is of type record and contains the following values.

  • predictions.in_app_purchase_score_7d, DOUBLE

    The probability that a user who was active in the last 28 days will purchase (log an in_app_purchase event) within the next 7 days

  • predictions.purchase_score_7d, DOUBLE

    The probability that a user who was active in the last 28 days will purchase (log a purchase event) within the next 7 days

  • predictions.churn_score_7d, DOUBLE

    Contains the probability that a user who was active on your app or site within the last 7 days will not be active within the next 7 days.

  • predictions.revenue_28d_in_usd, FLOAT

    Contains expected revenue, in USD, from all purchase events within the next 28 days from a user who was active in the last 28 days

Audit

At the very end of the table, there are two separate columns containing the dates of the changes in the record. These columns are used for auditing purposes.

  • occurrence_date, STRING

    Date when the record change was triggered

  • last_updated_date, STRING

    Date when the record was updated in the table

Record Type Data in BigQuery Import of GA4

Many of the fields, as mentioned above, are Record type fields. This field contains the entire record in one column. Think of it as a table (record) within a table. 

Each record has a key value containing the name of the value (e.g. event_params.key) and the value containing the actual value.

Actual values are contained in a column that matches the data type of the value.  There are four possible data types, String, Integer, Float, or Double, and hence you will see four columns but only one of the columns contains the value while the other columns are null.

For example, for the event parameters, there is a column called, event_params.key. Let’s say this key value is session_number. This means the value column corresponding to this event parameter will contain the session number of the user, which is an integer value.  So event_params.value.int_value column is populated with the value while other columns are empty.

Don’t worry if this concept is a little difficult to understand. It will become clear in future posts.  However, if you want to speed up your learning then join my BigQuery for Marketing

If you have any questions then do not hesitate to reach out to me. I will start answering some of the questions in future editions.

Help me promote this newsletter, invite your friends and colleagues, and earn rewards.

Advanced Notes

Subscribe to Premium Membership to read the rest.

Become a paying subscriber of Premium Membership to get access to this post and other subscriber-only content.

Already a paying subscriber? Sign In

A subscription gets you:
Advanced Content
Readymade SQL Scripts

Subscribe to keep reading

This content is free, but you must be subscribed to GA4 BigQuery For Marketing Analytics to continue reading.

Already a subscriber?Sign In.Not now

Join the conversation

or to participate.