If you’ve ever had to consolidate a bunch of messy data into a few stem patterns, read on.
I recently found myself needing to consolidate a bunch of Google Analytics page view data into a shortened list. The client had not made use of the ‘Exclude URL Query Parameters’ feature, so the page views included session IDs and other dynamic parameters that turned a few pages into a list of 1,500 unique URLs. I knew I could do this with regular expressions, and I thought to check if Sheets has a way to do it since Google includes support for regular expressions in many of their products. I was happy to find that Sheets was no exception. If you are not familiar with regular expressions, they are instances of a standard syntax for pattern-matching text. Check out this post for some basic how-tos: Regular Expressions 101.
The find and replace function in Google Sheets has a checkbox ‘Search using regular expressions’. Next to the checkbox, there is a link to their help page. Dive in there if you want to get into the real nitty-gritty. I’m going to describe one simple use of this functionality that doesn’t require in-depth knowledge of regular expressions.
This is a sample of the data I started with:
Page | Pageviews |
---|---|
/products/calendar_new?arrival=2017-02-04&nights;=4&session;=Nb1lmU0p | 1 |
/products/calendar_new?arrival=28+Jan+2017&nights;=4&session;=M0tpd0I0 | 1 |
/products/calendar_new?arrival=19+Jan+2017&nights;=6&session;=KZjV0cksz | 1 |
/products/calendar_new?arrival=21+Oct+2016&nights;=2&session;=b3I2SzUE | 1 |
/products/new?arrival=11+Oct+2016&nights;=4&session;=EVjNMZn4 | 1 |
/products/new?arrival=1+Jan+2017&nights;=6&session;=MR1lCR09aV | 1 |
/products/new?arrival=1+Nov+2016&nights;=4&session;=JeHFpZ2o | 1 |
/products/new?arrival=1+Oct+2016&nights;=7&session;=GRVV2SU9 | 1 |
To replace the “arrival” value in the URL, I use a regular expression that looks like this:
arrival=.+?&
Here’s a breakdown of what that says:
One thing you might be wondering: if ? is a special character, how do you match a ? in text? To do this, use a front slash to escape the ?, like this \?. There are a number of other special characters that need to be escaped. For example, periods and plus signs need to be escaped if you are searching for them as text, e.g. \. and \+
If you want to test out a regular expression to see if it matches what you are looking for, www.regexr.com is a great, free tool. It also has a concise reference on how to write regular expressions, including more detail on escape characters.
Here’s what my pattern looks like in the Find and replace box:
Notice that I am replacing it with “arrival=&” – I didn’t want to remove the parameter entirely, I just wanted to get rid of the dynamic values. You could also replace with nothing to get rid of the parameter completely. Next, I do the following replacements:
Once I’m done replacing, I run a pivot to get summary totals on the consolidated URLs:
Page | Pageviews |
---|---|
/products/calendar_new?arrival=&nights;=&session;= | 4 |
/products/new?arrival=&nights;=&session;= | 4 |
How cool is that? With Excel, I used to do this kind of thing with ridiculously complicated text formulas. Did I mention how much I love Google Sheets?
The latest developments in digital marketing include Meta feature updates, Google downgrading ranking on vastly…
ChatGPT traffic in the GA house! Plus new features in GA4 and understanding GTM first-party…
This article details the process of building two BigQuery tables for path analysis, with a…