A question that pops up a lot is: I’ve added experiment events in GA4, now what? How can I tie these events to conversions?
This question is rather fundamental when you rephrase it like this:
I want to know…
- when a certain event happened for a user
- what happens next, and how often?
In the GA4 interface, you can create an explore report of type Funnel and you will have a convenient user interface to click to an answer.
But.. if you want un-modeled, un-thresholded and exactly counted data, you will need to resort to Big Query for an answer, and you will need SQL.
Here’s a query that will help you get started answering that question:
The basic and effective approach
The simplest approach works like this:
- Give me all users that were in the experiment (or: where event X happened), along with the following information:
- the timestamp: when did they become part of the experiment
- the variant: which treatment did they receive
- Give me all users that converted (or: where event Y happened), along with the following information:
- the timestamp of the conversion event
- optionally: the conversion value (e.g. revenue)
- Compare the users from group 1 and 2
- the user needs to be the same (obviously)
- the conversion timestamp needs to be after the experiment
- Aggregate the results into something that can be entered into a calculator
- sample ratio mismatch calculator
- significance calculator
- an R or Python notebook
When you nail this approach into a query, there’s some depth you can add. Like
- Breakdown by device
- Removing outliers (super large orders, people who convert 100 times, etc)
- adding multiple conversions, even multiple steps
But, let’s KISS – keep it simple, stupid and look at how this translates into SQL:
The SQL
In order to make the query flexible, I’m adding 3 flexible variables. Two for dates, and 1 for experiment ID.
The assumption is that the experiment ID is in an event parameter of type STRING, called ‘exp_id’, and the variants are in a STING parameter named ‘exp_variant’.
declare start_date STRING default '20230621'; -- YYYYMMDD: when did the experiment start declare end_date STRING default '20230626'; -- YYYYMMDD: and end declare experiment_id STRING default 'Experiment 001'; -- the exp_id parameter you want to filter
Let’s start the query with (see above – step 1) – get met the users that are in the experiment:
with in_experiment as ( SELECT user_pseudo_id as cookie, if( count(distinct (select value.string_value from unnest(event_params) where key = 'exp_variant')) > 1, '_multiple_variations_seen', max((select value.string_value from unnest(event_params) where key = 'exp_variant')) ) as variant, min(event_timestamp) as first_exposure FROM `production-proj.analytics_12345.events_*` where _table_suffix between start_date and end_date and (select value.string_value from unnest(event_params) where key = 'exp_id') like experiment_id group by 1 )
This will output 1 row per user (cookie) that was part of the experiment.
Note the variant line: this checks if the user has been exposed to multiple variations. If that is the case, we assign the “_multiple_variations_seen” value to that user. Don’t count them twice, don’t include those users in the results.
Now, move on to step 2 and 3:
- start with the users in the experiment
- add conversion columns via a left join, based on
- same user
- conversion happened on a later point in time
, raw_results as ( SELECT exp.cookie, exp.variant, countif(event_name = 'add_to_cart') as add_to_carts, countif(event_name = 'purchase') as purchases, max(if(event_name = 'add_to_cart', 1, 0)) as add_to_cart_users, max(if(event_name = 'purchase', 1, 0)) as purchase_users, from in_experiment as exp left join `production-proj.analytics_12345.events_*` as events on exp.cookie = events.user_pseudo_id and exp.first_exposure <= events.event_timestamp where events._table_suffix between start_date and end_date group by 1,2 )
You see I count 2 conversion events: add_to_cart and purchase.
And I count them twice: a raw counter (the number of times per user) and a user counter (either 1 or 0 per user).
The join condition is restricted to “same user” and “conversion event must be later than the experiment exposure”.
Summarizing it together
The next bit is just presenting everything in a nice table:
- The summary CTE aggregates the counters per variant.
- The final query adds rates and does some nice formatting
, summary as ( select variant, count(distinct cookie) as users, sum(add_to_carts) as add_to_carts, sum(add_to_cart_users) as add_to_cart_users, sum(purchases) as purchases, sum(purchase_users) as purchase_users, from raw_results group by 1 ) select *, round(100 * add_to_cart_users / users, 2) || '%' as add_to_cart_rate, round(100 * purchase_users / users, 2) || '%' as purchase_rate, from summary order by variant
The Result
A sample result can look like this:
The metrics ending with “users” can be plugged into a statistical calculator (for a simple binomial yes/no) result.
And of course, you can use the “raw_results” CTE output to read into a notebook or raw results calculator.
What’s next?
Using this query, you will have a starting point to summarize you AB test results quickly and easily.
When you know what you’re doing, you can start adding drilldowns, custom conversions, etc.
Can you even be more clever? For sure! Using a WINDOW function in stead of a JOIN will open some more doors to tweak your result set (conversion windows come to mind).
For now: thank you for reading. If you have found this useful: please share, comment, spread the word!
If you want to know more: you know where to find me :)
Leave a Reply