How to Build a Sessions Table from Raw GA4 Data in BigQuery
This is the first in a series of articles talking about BigQuery and the common questions we are asked.
For anyone who has started using BigQuery with GA4 data, there’s one main misunderstanding we come across: sessions. We’ve had sessions for years. We understand them, and even when GA4 came along built on events, we still had sessions in the interface. But BigQuery doesn’t deal with sessions, it’s purely events.
GA4’s BigQuery export gives you a flat table of events. Every page view, scroll, click and purchase is a separate row. That’s useful, but there’s no sessions table out of the box.
That means if you want to work with sessions in BigQuery, you need to build them yourself.
What is a session in GA4
A session is a group of events fired by the same user within the same visit. GA4 creates a new session when a user arrives on the site, and will start a new one if there’s a period of inactivity or if the user comes back via a different marketing channel.
When a new session begins, GA4 fires a session_start event. Each session gets a ga_session_id a number GA4 generates at that point. Combined with user_pseudo_id, which identifies the device or browser, you have everything you need to group events into sessions.
Why not just count session_start events?
You could just do this:
SELECT COUNT(*) AS total_sessions FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` WHERE event_name = 'session_start' AND _TABLE_SUFFIX BETWEEN '20201101' AND '20210131'
And you’d get a count of session_start events. But that’s all you’d get: a number with no context. You can’t join it to anything, you can’t see which sessions came from paid search, and you can’t tell which ones resulted in a purchase.
There’s also an accuracy problem. The session_start event doesn’t always fire once per session.
If your site is running GA4 Consent Mode (and if you’re not, why not?) GA4 will fire events for users who haven’t given consent, but without a persistent cookie to identify them. That means each page they visit can look like a new session, because there’s nothing to stitch their journey together. The result is inflated session_start counts in BigQuery that are very difficult to spot if you’re just counting rows.
When you build a sessions table using user_pseudo_id and ga_session_id, duplicates are removed, as well as those where consent wasn’t given.
That’s what we’re building here.
What does the raw data look like?
If you run a simple query against the GA4 events table in BigQuery, the results can look a bit confusing at first. Each event isn’t a single clean row with all its data in columns. Instead, the parameters for each event are stored in a field called event_params, which means each parameter sits on its own row.
A single session_start event looks like this in the raw data:
| event_name | event_params key | event_params value |
| session_start | ga_session_id | 9819679542 |
| session_start | page_location | https://shop.googlemerchandise… |
| session_start | page_referrer | null |
| session_start | ga_session_number | 2 |
| session_start | page_title | Home |
| session_start | session_engaged | 1 |
| session_start | engaged_session_event | 1 |
Extracting the data with UNNEST
To extract a specific parameter from event_params you use a subquery with UNNEST. It looks like this:
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id
This unpacks the event_params array, finds the row where the key is ga_session_id, and returns the value as a single column. You’ll use this pattern repeatedly when working with GA4 data in BigQuery.
Note that ga_session_id is stored as an integer, so we use value.int_value. Source and medium are strings, so those use value.string_value. If you get that wrong (and you will), you’ll get a lot of null results.
The SQL
Here’s the full query to build a clean sessions table:
SELECT user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id, TIMESTAMP_MICROS(event_timestamp) AS session_start_time FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` WHERE event_name = 'session_start' AND _TABLE_SUFFIX BETWEEN '20201101' AND '20210131'
What each field means
| Field | What it is |
| user_pseudo_id | A unique identifier GA4 assigns to each device or browser. It’s cookie-based, not tied to a logged-in user |
| session_id | A number GA4 generates when a new session starts. Not unique on its own, two different users can have the same value |
| user_pseudo_id + session_id | Combined, these give you a truly unique session key. Always use both together when joining or grouping session data |
| session_start_time | When the session started, converted from GA4’s raw format into a readable timestamp |
| page_title | The title of the page_location of this event |
| session_engaged | A GA4 flag that marks whether the session met GA4’s definition of an engaged session. That means the user either spent more than 10 seconds on the site, visited more than one page, or triggered a conversion event. |
| engaged_session_event | This is the event parameter that gets set to 1 when an engaged session is recorded. It’s essentially the same concept as session_engaged but at the event level. |
A note on timestamps
GA4 stores all timestamps as microseconds, a 16-digit number that looks something like 1716556800000000, which is no use to anyone.
TIMESTAMP_MICROS() converts that into a standard timestamp like 2026-05-24 09:00:00 UTC. You’ll use this pattern constantly when working with GA4 data in BigQuery – always wrap event_timestamp in TIMESTAMP_MICROS() if you want a date or time you can actually use.
This is how the that query would appear. These are your individual sessions:
| user_pseudo_id | session_id | session_start_time |
| 1013488.1622938896 | 6798788849 | 2020-12-24 08:23:55.448219 UTC |
| 1017530.2001578558 | 4504003660 | 2020-12-24 09:52:21.746864 UTC |
| 1023700.4069698118 | 415861123 | 2020-12-24 15:07:10.384832 UTC |
| 1025058.8151789728 | 4989161801 | 2020-12-24 14:27:39.171057 UTC |
| 1066151.7946216838 | 9972822153 | 2020-12-24 01:23:25.359576 UTC |
| 1086504.3866945056 | 2995937820 | 2020-12-24 08:30:50.142091 UTC |
| 1093320.7859109427 | 7544406193 | 2020-12-24 06:26:27.535701 UTC |
| 1094309.5476149422 | 9957962386 | 2020-12-24 14:50:09.686105 UTC |
| 1095079.1491121596 | 4628174210 | 2020-12-24 00:40:09.113465 UTC |
| 1102669.7186342617 | 9077621333 | 2020-12-24 12:47:16.938713 UTC |
You would obviously join that up to monthly sessions (below) or split further:
SELECT
FORMAT_TIMESTAMP('%Y-%m', session_start_time) AS month,
COUNT(*) AS total_sessions
FROM (
SELECT
user_pseudo_id,
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id') AS session_id,
TIMESTAMP_MICROS(event_timestamp) AS session_start_time
FROM
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
WHERE
event_name = 'session_start'
AND _TABLE_SUFFIX BETWEEN '20201101' AND '20210131'
)
GROUP BY month
ORDER BY month
| month | total_sessions |
| 2020-11 | 106585 |
| 2020-12 | 131836 |
| 2021-01 | 116549 |
Summary
BigQuery doesn’t give you sessions, it gives you events. To do anything useful with your GA4 data at session level, you need to build that data yourself.
The key points:
- Filter to session_start events only, it’s cleaner than scanning the full events table
- Always use user_pseudo_id and session_id together, neither is unique on its own
- Use UNNEST to extract parameters from event_params
- Wrap event_timestamp in TIMESTAMP_MICROS() to get a readable timestamp
- Don’t just count session_start events, consent mode can inflate that number and you’d never know
Once you have a clean sessions table, you can start doing the useful stuff: conversion rates, channel performance, engagement analysis. We’ll cover those in the next articles.
If you’ve got questions about working with GA4 data in BigQuery, or want help building out your analytics setup, get in touch with our team or explore more Insights below.