Recent articles and items of interest, mostly related to the Google analytics stack (GA4, GTM, Looker (Studio), BigQuery, CoLab), from Two Octobers Head of Analytics and co-founder Nico Brooks.
Dark Thoughts on Backed-up Universal Analytics Data
We spent a fair amount of time in June setting up and babysitting Universal Analytics data exports for our clients. During this process, I found myself in a dark place. My inner dialogue went like this:
- “Have you ever looked at a Google Analytics property and taken the data at face value?”
- “Well, no, I haven’t. I generally start with the assumption that tracking is messed up, and look for evidence of cross-domain tracking issues, duplicate tags, etc. Nine times out of ten, I find them.”
- “So without the ability to understand the context of archived data, what good is it?”
I’ll spare you the rest of my internal back and forth, but the long and short is that context is everything in analytics. I am confident in the configuration of properties that my team audited and optimized, but even then if I look at a traffic surge from ten years ago, I couldn’t tell you whether it was legitimate or otherwise, without doing a bit of data forensics.
This realization was a helpful reminder for me that data is not truth. The page views, traffic sources, etc. we archived will come in handy, I’m sure, but it isn’t really feasible to uncover and record all of the context that would give me full confidence in the stories it can tell.
Improved Google Ads Attribution in GA4
Google fixed a long-standing bug that resulted in a mis-attribution of Google Ads conversions to Google organic traffic. The fix just rolled out, but I did a comparison of a dozen GA4 properties that are linked to Google Ads, comparing the last seven days to the last seven days of May.
- Two properties showed a decrease in conversion rate
- Three properties stayed about the same
- Seven increased
- The median change was +14%. For clarity, that is the percentage improvement in conversion rate. So, for example, if the conversion rate went from 2% to 3%, that would be a change of +50%.
I’m going to call that inconclusive, but promising. It’s also possible that the change hasn’t rolled out to all of the properties I looked at, so it will be interesting to see how July numbers look.
A Five-Minute GA4/GTM Audit That Doesn’t Work (yet)
Google introduced a potentially handy, but not-yet-ready-for-prime-time Tag Diagnosis feature for the Google Tag. It displays with the Tag Settings in GA4. To see it, go to Admin > Data Stream > Configure tag settings. It hasn’t rolled out to all properties yet, but if it is enabled, you’ll see a Tag quality rating above your tag’s settings like this:
The issues I’ve found with this feature so far are:
- It may say ‘Some of your pages are not tagged’ when they are in fact tagged.
- It may tell you that ‘Additional domains detected for configuration’ when you are only tracking one domain.
I’m excited for the kinks to be worked out because the issues it is meant to diagnose are some of the first things I look for when I audit a GA4 property. Auditing tasks that currently take 30 minutes to an hour will take less than five minutes.
A Couple of New Looker Studio Features I Love
- Bin custom fields: you can now add a custom field to a datasource that groups numeric values into ranges without using a case statement. So, for example, you can take Google Search Console ranking data and group it into 1-10, 11-20, etc. to better visualize how your presence in search results has changed over time.
- Group “other” values: if you set a limit on the number of dimensions you want to visualize on a chart, you can now automatically group the remaining dimensions as ‘Others’. I usually limit dimensions on line charts, stacked area charts, pie charts, and bubble charts. It makes the charts easier to interpret, but you don’t know how much data you are missing. Now you do!
GA4 in BigQuery: We Got To Get Up and Sessionize!
I’ve been actively working with GA4 data in BigQuery for several years now, but it was only a few weeks ago that I fully grokked why sessionization is so important. I plan on doing a more detailed article on this topic, but here is the short version:
- Imagine you want to take the BigQuery raw event data and show a trendline of sessions by month by channel (like the visual above)
- You write a query that derives the Session default channel grouping from Session source, medium and campaign and sums sessions by day
- So far, so good. The session counts on your chart match GA4 nearly perfectly. Then you add a scorecard showing the total number of sessions for the time period. For some reason, the scorecard is a lot higher than the total in GA4. Why?
The answer is that GA4 counts a session that spans midnight as a single session, while your query counts sessions by day. So let’s say session 1234 starts at 11:45 pm and ends at 12:15 am the next day. Your query will count two sessions, one for each day. This problem is small for a local business that gets very little traffic at midnight, but it can be huge for a multi-national business that gets traffic around the clock. A similar, but worse problem happens with user counts. And any metrics that are based on sessions or users are similarly flawed.
The solution is to roll up metrics by session, and preserve user and session IDs in your sessionized table (user_pseudo_id and ga_session_id for anyone who wants to try this at home). You then create calculated fields on your datasource in Looker studio that do session and user counts and all of their derivations. In addition to solving the issue of overcounting sessions and users, this approach also means you can have one multi-purpose table that addresses most reporting needs.
This article helped show me the light on sessionization, and this github repository should go a long way towards helping you build your own sessionized goodness. I’m also a giddy fanboy of Johan van de Werken, but more on that in my comprehensive post.
Meta is Importing Data From GA4?
Wait what?!? I can’t believe this is getting so little attention. Apparently, Meta has floated out a GA4 import function to select clients with the goal of releasing it to general availability soon. If, like me, you’d rather chew your own arms off than try to navigate the hellscape of Meta Business Managers, Ad accounts and company Pages to set up website conversions, this is really big news. It also means that your website conversion attribution is based on a first-party GA4 cookie, versus the doomed third-party Meta cookie.
Content We’ve Published
- I did a series for Root & Branch on Google Search Console reporting in Looker Studio. The last in the series published this month: How to Visualize Topic Clusters and Trends in Looker Studio. Parts one and two.
- I’ve also spent too much time this year hunting for bots in GA4. If you, too, are a reluctant bot hunter, this blog post series is for you. I’ve broken it down into three key parts, identifying, filtering in reports, and filtering the traffic so it’s not logged in GA4 in the first place.
Articles/Videos That Made Me Smarter
- Introducing the 3-30-300 Rule for Better Reports
A lot of treatises on data visualization boil down to “keep it simple,” but that advice is hard to follow when your stakeholders’ needs range from the general to the specific. This article does an excellent job of explaining why and how to build better reports and dashboards. - How to Join GA4 and Search Console data
This is a beautiful explanation of a complex topic. Hats off to Dominic Woodman. - Marketing Analytics Data is Wrong. Can It Be Fixed?
Dana DiTomaso details the increasing challenges of tracking people online and offers up some solutions. Her pie-chart-within-a-pie-chart in particular has me questioning my general distain for pie charts. - How To Think About Your (GA4) Data (in BigQuery): There’s Levels To This
I’ve found that dimension and metric scope is one of the most confusing aspects of GA4. Krisztián Korpa does a nice job of walking us through the messy details.