How to Build a Sessions Table from Raw GA4 Data in BigQuery

This is the first in a series of articles talking about BigQuery and the common questions we are asked.

For anyone who has started using BigQuery with GA4 data, there’s one main misunderstanding we come across: sessions. We’ve had sessions for years. We understand them, and even when GA4 came along built on events, we still had sessions in the interface. But BigQuery doesn’t deal with sessions,  it’s purely events.

GA4’s BigQuery export gives you a flat table of events. Every page view, scroll, click and purchase is a separate row. That’s useful, but there’s no sessions table out of the box.

That means if you want to work with sessions in BigQuery, you need to build them yourself.

What is a session in GA4

A session is a group of events fired by the same user within the same visit. GA4 creates a new session when a user arrives on the site, and will start a new one if there’s a period of inactivity or if the user comes back via a different marketing channel.

When a new session begins, GA4 fires a session_start event. Each session gets a ga_session_id  a number GA4 generates at that point. Combined with user_pseudo_id, which identifies the device or browser, you have everything you need to group events into sessions.

Why not just count session_start events?

You could just do this:

SELECT 
COUNT(*) AS total_sessions
FROM
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
WHERE
event_name = 'session_start'
AND _TABLE_SUFFIX BETWEEN '20201101' AND '20210131'

And you’d get a count of session_start events. But that’s all you’d get: a number with no context. You can’t join it to anything, you can’t see which sessions came from paid search, and you can’t tell which ones resulted in a purchase.

There’s also an accuracy problem. The session_start event doesn’t always fire once per session.

If your site is running GA4 Consent Mode (and if you’re not, why not?) GA4 will fire events for users who haven’t given consent, but without a persistent cookie to identify them. That means each page they visit can look like a new session, because there’s nothing to stitch their journey together. The result is inflated session_start counts in BigQuery that are very difficult to spot if you’re just counting rows.

When you build a sessions table using user_pseudo_id and ga_session_id, duplicates are removed, as well as those where consent wasn’t given.

That’s what we’re building here.

What does the raw data look like?

If you run a simple query against the GA4 events table in BigQuery, the results can look a bit confusing at first. Each event isn’t a single clean row with all its data in columns. Instead, the parameters for each event are stored in a field called event_params, which means each parameter sits on its own row.

A single session_start event looks like this in the raw data:

event_name event_params key event_params value
session_start ga_session_id 9819679542
session_start page_location https://shop.googlemerchandise
session_start page_referrer null
session_start ga_session_number 2
session_start page_title Home
session_start session_engaged 1
session_start engaged_session_event 1

Extracting the data with UNNEST

To extract a specific parameter from event_params you use a subquery with UNNEST. It looks like this:

(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id') AS session_id

This unpacks the event_params array, finds the row where the key is ga_session_id, and returns the value as a single column. You’ll use this pattern repeatedly when working with GA4 data in BigQuery.

Note that ga_session_id is stored as an integer, so we use value.int_value. Source and medium are strings, so those use value.string_value. If you get that wrong (and you will), you’ll get a lot of null results.

The SQL

Here’s the full query to build a clean sessions table:

SELECT
user_pseudo_id,
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id') AS session_id,
TIMESTAMP_MICROS(event_timestamp) AS session_start_time
FROM
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
WHERE
event_name = 'session_start'
AND _TABLE_SUFFIX BETWEEN '20201101' AND '20210131'

What each field means

Field What it is
user_pseudo_id A unique identifier GA4 assigns to each device or browser. It’s cookie-based, not tied to a logged-in user
session_id A number GA4 generates when a new session starts. Not unique on its own, two different users can have the same value
user_pseudo_id + session_id Combined, these give you a truly unique session key. Always use both together when joining or grouping session data
session_start_time When the session started, converted from GA4’s raw format into a readable timestamp
page_title The title of the page_location of this event
session_engaged A GA4 flag that marks whether the session met GA4’s definition of an engaged session. That means the user either spent more than 10 seconds on the site, visited more than one page, or triggered a conversion event.
engaged_session_event This is the event parameter that gets set to 1 when an engaged session is recorded. It’s essentially the same concept as session_engaged but at the event level.

A note on timestamps

GA4 stores all timestamps as microseconds, a 16-digit number that looks something like 1716556800000000, which is no use to anyone.

TIMESTAMP_MICROS() converts that into a standard timestamp like 2026-05-24 09:00:00 UTC. You’ll use this pattern constantly when working with GA4 data in BigQuery – always wrap event_timestamp in TIMESTAMP_MICROS() if you want a date or time you can actually use.

This is how the that query would appear. These are your individual sessions:

user_pseudo_id session_id session_start_time
1013488.1622938896 6798788849 2020-12-24 08:23:55.448219 UTC
1017530.2001578558 4504003660 2020-12-24 09:52:21.746864 UTC
1023700.4069698118 415861123 2020-12-24 15:07:10.384832 UTC
1025058.8151789728 4989161801 2020-12-24 14:27:39.171057 UTC
1066151.7946216838 9972822153 2020-12-24 01:23:25.359576 UTC
1086504.3866945056 2995937820 2020-12-24 08:30:50.142091 UTC
1093320.7859109427 7544406193 2020-12-24 06:26:27.535701 UTC
1094309.5476149422 9957962386 2020-12-24 14:50:09.686105 UTC
1095079.1491121596 4628174210 2020-12-24 00:40:09.113465 UTC
1102669.7186342617 9077621333 2020-12-24 12:47:16.938713 UTC

You would obviously join that up to monthly sessions (below) or split further:

SELECT
FORMAT_TIMESTAMP('%Y-%m', session_start_time) AS month,
COUNT(*) AS total_sessions
FROM (
SELECT
user_pseudo_id,
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id') AS session_id,
TIMESTAMP_MICROS(event_timestamp) AS session_start_time
FROM
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
WHERE
event_name = 'session_start'
AND _TABLE_SUFFIX BETWEEN '20201101' AND '20210131'
)
GROUP BY month
ORDER BY month

 

month total_sessions
2020-11 106585
2020-12 131836
2021-01 116549

Summary

BigQuery doesn’t give you sessions, it gives you events. To do anything useful with your GA4 data at session level, you need to build that data yourself.

The key points:

  • Filter to session_start events only, it’s cleaner than scanning the full events table
  • Always use user_pseudo_id and session_id together, neither is unique on its own
  • Use UNNEST to extract parameters from event_params
  • Wrap event_timestamp in TIMESTAMP_MICROS() to get a readable timestamp
  • Don’t just count session_start events, consent mode can inflate that number and you’d never know

Once you have a clean sessions table, you can start doing the useful stuff: conversion rates, channel performance, engagement analysis. We’ll cover those in the next articles.

If you’ve got questions about working with GA4 data in BigQuery, or want help building out your analytics setup, get in touch with our team or explore more Insights below.

Insights