Track, Store and Analyze granular Page Performance data: a practical guide

Last update: 2020-07-31: Added fallback to old version of performance timing, to cover more browsers

In this post, I’ll guide you through all steps needed to collect, process, and analyse the Navigation Timing results of all your web site visitor’s page views.

In the end, you’ll be able to make (hopefully nicer looking) dashboards like this:

This is going to be a long read, so grab a coffee, take a deep breath and read on. I’ll try to make the article as readable as possible.

The summary of what you will get

In this article I will provide you with

all variables and code to implement in Google Tag Manager, so you won’t have to think (and click 632 times to create all the variables)
the SQL to nicely model the data in an tidy non-nested table
a step by step guide on how to set things up and use the data

On a high level, you will get:

insight in you web page performance (trends and on a granular level)
the ability to spot problems and bottlenecks per page
to see if a new design or site release affected performance

For the high level stuff, you need to use your own judgement and put some effort in, but you will have the data needed to be able to get these insights.

The why and the what

Page Speed performance is incredibly important. Every millisecond counts, and you will lose customers, leads and SEO ranking if you have a slow site.

But how much business will you lose? Should I make my website faster? Do I have a business case? Can I spend $10,000 to make my site load faster?

It’s impossible to tell without having data.

So: I’m giving you data. Data from your web site visitors. In your own database. With free tools. The only thing you have to do is set it up.

Enter: the Navigation Timing API

Your web site visitor’s browser stores micro-second level data of its own performance. From how “heavy” your page is, when it connects, to how long it takes before it’s done loading, with all steps in between.

The image below shows you what timings are stored when a page loads in the browser.

Source: https://www.w3.org/TR/navigation-timing-2/

These metrics can be read via a browser API using simple Javascript. And in this post, I’ll describe how to tap into this data and get it for every page that is visited on your website.

Note: This method uses the navigation timing API version 2, which is currently not supported in Safari.

To measure Safari users’ behaviour, it falls back to version 1, which is a bit less complete. (Updated this: 2020-07-31)

Steps to take and tools you’ll need

I will focus on the Google stack for this, since the tools are basically free, and widely used. You will need..

access to Google Tag Manager for your website, or the ability to place a Tag Manager container and publish tags
access to Firebase analytics, or: a Google Analytics App+Web property
access to a Google Cloud project (with billing enabled) and enough rights to manage Big Query Tables
Optional: a data modeling tool (I use Dataform) to schedule queries and manage tables
Optional: A data visualisation tool like Google Datastudio or R and ggplot to query your data and make nice graphs.

Steps involved

The exact order is kind of flexible, but the most logical is the following. You might be able to skip some steps

Set up a Google Cloud project and enable billing
Create a Google Analytics App+Web property (via a Firebase Analytics project)
Enable the streaming of Firebase data to Big Query
Create / import a Google Tag Manager container and include it on your web site
Set up data collection tag + variables in Google Tag Manager
Next day: see if the data is in your Big Query events table
Attach the event parameters to the event in the App+Web reporting user interface
Use Big Query to create a usable dataset to work with
Schedule the Query so you have daily fresh data
Use the data to graph the data and find insights.

Let’s start!

Here, I’m going to redirect you to pages of others. Not that I don’t want to guide you through it, but this post would otherwise become way too long.

And the nice folks I link to explain it very well. You’re in good hands.

If you’re well into the Google Stack, you can probably skip most steps here.

Step 1: create a Google Cloud Project and enable billing

A Big Query database is living in the Google Cloud, and needs a project to run under. This is not free per se.

The free tiers are generous, but if you’re going to query more than 1 TB per month, or store lots of Gigabytes of data, it’ll cost you some dollar amount.

Start here: https://console.cloud.google.com/cloud-resource-manager

If this is your first project, you’re probably going to be eligible for $300 worth of free credits, which is a lot of query’ing and storing. You need billing enabled to your project, too. This is needed for Firebase later on, for the ‘pay as you go’ plan. (Which has, again, a very generous free tier).

You will now have

Step 2 and 3: Create an App + Web Analytics property (via Firebase) and enable streaming to Big Query

As is often the case, the prolific blogger Simo Ahava has written a nice howto that is better than the official documentation.

Follow the steps in his post carefully, and come back when you’re done.

Done? Awesome! I hope you managed to do it in a reasonable amount of time.

If all went well, you now have

a Cloud Project
a Big Query Analytics dataset (without tables, since you’re not collecting data yet)
datasets generated by Firebase

Time to populate the daily tables, let’s create the tags!

Step 4: Install a Google Tag Manager container on your web site

You’re probably already using Google Tag Manager on your site to manage tags. If that’s the case, make sure you have publish rights.

If you’re not.. you can start now. It’s fairly easy. Follow these steps: https://support.google.com/tagmanager/answer/6103696?hl=en

Import my container with the full configuration

I’ve made a pre-built container with

A HTML tag that pushes the Performance Data in the dataLayer
An App + Web configuration Tag
An App + Web event tag that collects the data
24 variables, most of them performance timing variables

First, download the container file directly: raw json file or via the Github Gist Page of the container

Finally, import the container into your existing container, via the Admin section of GTM, like this:

Make sure to Merge the imported data into your container. If you overwrite it, your other tags will be overwritten. That’s probably not what you want.

Step 5: Set up data collection tag + variables in Google Tag Manager

In order for the collection to work, you need to send it to the correct App+Web property. Remember? It’s the one you created in step 2.

It looks like G-123123 and can be found via Admin > Data Streams > Web in the Google Analytics interface.

Details here: https://support.google.com/analytics/answer/9310895

Now, in the Tag Manager variable, go to the const variable, and paste the number in the box, and click save.

This will send the data to the property you have created. When you publish your container, it will start collecting data. You’ve now done most of the work, congratulations!

Before you publish, a bit of background on how it works

The tags, triggers and variables work like this:

On every page load (window.load event), a script tag is executed that collects the following information into the dataLayer, like this:

window.dataLayer.push( {{cjs - performance navigation event for dl}} );

it signals the GTM event performance navigation
it pushes the navigation performance object to the dataLayer
it pushes data of first_paint and first_contentful_paint

The actual App+Web event triggers on this performance navigation v1 or v2 event, and sends the data key/value pairs like this

PARAMETER NAME	CONTENT
type	navigation type (navigate, reload, back_forward)
transferSize	INT – in octets, the complete transfer size, headers included
encodedBodySize	INT – in octets (8-bit blocks), the encoded payload size
decodedBodySize	INT – in octets (8-bit blocks), the decoded payload size
redirectCount	the number of redirects since the last non-redirect navigation under the current browsing context
redirectEnd	after receiving the last byte of the response of the last redirect (msec)
duration	difference between loadEventEnd and startTime (msec)
domComplete	time to dom complete/ready (msec)
domContentLoadedEventStart	time to dom content loaded event start (msec)
domContentLoadedEventEnd	time to dom content loaded event end (msec)
domInteractive	time to dom interactive (msec)
loadEventEnd	time when the load event of the current document is completed (msec)
loadEventStart	time when the load event of the current document starts (msec)
requestStart	time immediately before the user agent starts requesting the current document from the server (msec)
responseStart	immediately after the user agent receives the first byte of the response (msec)
responseEnd	immediately after receiving the last byte of the response (msec)
unloadEventEnd	zero, or time to unload event end (msec)
unloadEventStart	zero, or time to unload event start (msec)
redirectStart	before the user agent starts to fetch the resource that initiates the redirect (msec)
connectStart	The same value as fetchStart, or the time of the connection to the server to retrieve the resource
fetchStart	time to fetch start (msec)
domainLookupStart	the time of domain data retrieval from the domain information cache
first_contentful_paint	the time when the first paint happened that is visible/colourful (msec)
first_paint	the time when the user agent first rendered after navigation (msec)

A note about the data collection: since the data is send on window load, and after some scripts are executed, your data will suffer from survivor bias. That is: you’re performance metrics are complete ONLY for people who were patient enough for the load entries to complete.

Preview and publish.. your data will arrive in a day.

The data will be sent immediately to Google Analytics APP + Web. However.. the Big Query dataset won’t be available yet. You’ll have to wait a day.

The App + Web interface can be quite slow to update, too, so don’t hold your breath while waiting for the events to pop up.

To check if the events are sent from within the browser, you can check the console logging.

I use the excellent Adswerve dataLayer inspector for this. You can add this to Chrome or Edge and debug like an analytics pro.

If you want to know what the request parameters to the Google Analytics endpoint mean, check out David Vallejo’s measurement protocol v2 guide. He has pre-emptively documented this for you.

Step 6: Next day: see if the data is in your Big Query events table

Good morning! Hope you slept well.

If you’ve set up everything up to here: kudos! Time to open a tab and see if your Big Query dataset has data.

Go to the Big Query web interface and select the correct project. Then look at the data via Preview of the table or via SQL. Something like this will work:

SELECT
  e.key as key,
  round(avg(e.value.int_value))    as avg_value_integers,
  round(avg(e.value.double_value)) as avg_value_doubles
FROM
  `PROJECTNAME.analytics_YOURNUMBERHERE.events_*`,
   unnest(event_params) AS e
WHERE
  event_name = 'navigation_performance'
AND
  _table_suffix > '20200412' -- replace this by a later date
GROUP BY 1

The output will look something like this:

Note that some values are integers (INT64), and some are doubles (FLOAT). There are also string values (like the type key).

In the Firebase or App+Web interface, you can attach the event keys and set the reporting type to match what is sent (string, integer, time, size, etc).

On to the next step!

Step 7: Attach the event parameters to the event in the App+Web reporting user interface

To make use of the reporting interface of Firebase Analytics, you can tie the event keys to the navigation_performance event, and set them to the correct type. This way, the reporting widgets treat numbers as numbers, and milliseconds as time.

First, go to either the App+Web interface under events or to the Firebase console and under Analytics -> Events select the navigation_performance events, click the three-dotted icon that appears, and choose Edit parameter reporting.

You will see an interface where you can ADD the parameters, choose the type and for numbers the type of the type.

Nice little task before getting coffee: ADD them all 24, set the type, and click save.

Note the interface sometimes chokes. I suggest saving in between, or else you have to re-click everything again.

Step 8: Use Big Query to create a usable dataset to work with

The Big Query database that stores the data is very nested. Very Big Query-like.

This is actually super elegant, since the whole “capture all events” schema that it uses is in fact, really simple: for every hit (event), is stores a list of key/value pairs, that can be un-nested if needed.

This elegance, however, makes the tables not very flat, And.. sometimes you want to actually look at your data. And human brains are not made for consuming nested data.

Time to transform the data to fit where it’s needed.

Sidestep: Nested data, wide data, long data

Nested data is good for storing data. After all, it represents the data structure, and you can add data without changing the schema.

There’s 2 other ways to store data: wide and long.

This is wide data. If you’re an Excel user, this format looks familiar.

And this is long data: it’s almost nested data, but with duplication. Every row has the same event ID.

Every form has its use. In general:

for calculations or grouping, you need wide data:
- domComplete – domInteractive = … msec
- average loading time per navigation type = …
for Data Studio, you need wide data (precalculated or even raw metrics)
for a breakdown, you will need long data:
for easy graphing in ggplot, long data is often the format you need.

Widen the data, Query 1.

The Wide Query can be found here:

https://gist.github.com/zjuul/0ae677de25a7be4760813b45696b4934

And this produces the wide dataset. You can use this in Data Studio to plot all metrics and break them down by

device
page path
hostname
page title
device type
user
country
navigation type

Furthermore, you can create new metrics, like intervals between two timings.

You cannot stack 2 metrics in Data Studio, however. For this, you need long data.

“Melt” / “pivot_long” the data: make it long

The query uses the wide dataset and makes it Long by phase in the request.

The query is here:

https://gist.github.com/zjuul/e56cd50b759315c654031eeeaac5cfb6

And you can do stuff with it like this, to see performance over time:

Step 9: Schedule the Queries so you have daily fresh data

Now you have nice tables, you might want to use these with fresh data, and not type them in every time.

Couple of options here:

Use the built in “Schedule Query” option in Big Query, or even: Create a View (always fresh, slower, but might be more cost effective if you don’t use it often). Pick one.

Use a data modeling tool

Now you’re busy modeling data, you can opt to go for a specilalised external tool.

My favorite one is Dataform at the moment. It has version control, schedules, automatic depencency mapping, and documentation generation. And it has a free 1-person tier. Try it! (No affiliation, I just love the tool).

Step 10: Use the data to graph the data and find insights

Wow, step 10 already.

you have data
you have modeled it
now use it! That’s what it’s for, after all..

Data Studio to see insights

We all love and hate Data Studio. It can do so much for free!

You can add the long and wide datasets to a report by choosing Big Query and then navigate to the datasets.

Then, you can spend hours tweaking graphs and making pretty designs to impress your boss, clients and coworkers.

Example report using a 100% stacked bar, per page, breakdown by Phase.

Add filters for device, navigation type, country.. the possibilities are endless.

Use R to calculate impact of speed on conversion

You don’t have to stay in the Google stack. You can pull data in R to mode page speed in relation to web site behaviour.

After all, you have a user-id and client-id in the data, so you can join the data to outcomes.

Maybe you’ll end up with a great business case to justify spending money on improving the speed of your site.

And make your users happy.. they all like speedy sites!

Finally..

This turned out to be quite a lengthy blog post. I hope it is useful to you. If so: let me know. Tweet me @zjuul, leave a comment, or find me in real life (after the lockdown).

Thanks you for your attention, and happy optimizing!

Track, Store and Analyze granular Page Performance data: a practical guide

The summary of what you will get