Data Engineering

Google Analytics 4 & BigQuery: Create custom Channel Groupings in a reusable SQL function

Make your life easier and define custom Channel Groupings in a reusable function (UDF) when working with Google Analytics 4 data in BigQuery. The channel grouping definition based on GA4 parameters source, medium and campaign can be reused with ease, without duplicating all the channel grouping logic in every query (win!)

  • The function is saved into BigQuery as a persistent User Defined Function (UDF)
  • The channel grouping logic can be edited in one place and changes will directly affect all the queries where the UDF function is used.
  • Examples provided for both the Universal Analytics as the Google Analytics 4 default channel grouping definitions, which you can customize to your needs.

Update 2021-11-17: Google released a new Default Channel Grouping definition for Google Analytics 4. I've tried to capture this new GA4 definition in a BigQuery equivalent.

Defining Channel Groupings

Channel groupings are a logical combination of acquisition parameters (like source, medium, campaign) to classify your traffic into channels. In Google Analytics, the parameters mentioned are populated based on URL parameters (like ?utm_x=...) or the auto tagging link between GA and Google Ads or Campaign. More information about channel groupings in the Google documentation.

default-channel-grouping-paths.PNG

The definitions of the channels should depend on your specific business needs. Next to the Default Channel Grouping definition of Google Analytics, you can create your own custom channel grouping(s). As mentioned, mappings can be made based on dimensions source, medium, campaign, but you can add more parameters to the function easily.

In the first SQL example below, we've provided a generic definition which looks a lot like the Default Channel Grouping of Google Universal Analytics.

  • See this page for the Default Channel Grouping definition within the GA interface.
  • Google Analytics 4 does not provide the option (yet) to create custom channels groupings in the interface. Universal Analytics does, but it doesn't export these groupings to BigQuery (and if they did, you can't alter historical data).
  • The Google Analytics 4 Default Channel Grouping definition differs from Universal Analytics.
  • The functions below can be used for both the Google Analytics 4 and Universal Analytics 360 exports.

We highly recommend to customize your channel grouping based on your business needs. Therefore it's important to get your UTM tagging in order, so you can add more depth in to your mappings.

To illustrate, you could create the following additional rules in your groupings:

  • Paid search: branded vs. non-branded campaigns (based on the campaign parameter for example)
  • Social: organic social vs. paid social (make sure you ?utm= tag your social media campaigns accordingly)
  • Email: Transactional emails vs. marketing related emails

You could also throw the traffic source into the mix. This can be useful when you want to slim down all the unique "source / medium" combinations. Of course, you can create multiple functions for different levels of depth.

The BigQuery functions

Let's talk action. You can run the SQL function below directly in the BigQuery interface and it will automatically create the function within the dataset your specified in the CREATE OR REPLACE FUNCTION. It will show up in the tree on the left side of your screen;

user-defined-function-channel-grouping.PNG

Example 1 - Default Channel Grouping UA

This is the BigQuery equivalent of the Google Universal Analytics default channel grouping definition.

View code on GitHub

1create or replace function `<your-project>.<your-dataset>.channel_grouping`(tsource string, medium string, campaign string) as (
2    case
3        when (tsource = '(direct)' or tsource is null) 
4            and (regexp_contains(medium, r'^(\(not set\)|\(none\))$') or medium is null) 
5            then 'direct'
6        when regexp_contains(medium, r'^(social|social_advertising|social-advertising|social_network|social-network|social_media|social-media)$') 
7            then 'social'
8        when regexp_contains(medium, r'^(email|mail)$') 
9            then 'email'
10        when regexp_contains(medium, r'^(affiliate|affiliates)$') 
11            then 'affiliate'
12        when regexp_contains(medium, r'^(cpc|ppc|paidsearch)$') 
13            then 'search_paid'
14        when regexp_contains(medium, r'^(display|cpm|banner)$') 
15            then 'display'
16        when medium = 'organic'
17            then 'search_organic'
18        when medium = 'referral'
19            then 'referral'
20        else '(other)'
21    end
22);

Example 2 - Default Channel Grouping GA4

2021-11-17: Google recently updated the default channel grouping definition of Google Analytics 4, so I've replaced the old advanced example with the GA4 example, since there is a lot of overlap (the GA4 definition has mush more dept than the Universal Analytics one).

Be aware that not all the rules applied in GA4 can be used in BigQuery since we do not have the exact definitions of rules like "Source matches a list of social sites". Please let me know if you have any suggestions to further tweak the function.

View code on GitHub

1create or replace function `<your-project>.<your-dataset>.channel_grouping`(tsource string, medium string, campaign string) as (
2    case
3        when (tsource = 'direct' or tsource is null) 
4            and (regexp_contains(medium, r'^(\(not set\)|\(none\))$') or medium is null) 
5            then 'direct'
6        when regexp_contains(campaign, r'^(.*shop.*)$') 
7            and regexp_contains(medium, r'^(.*cp.*|ppc|paid.*)$') 
8            then 'shopping_paid'
9        when regexp_contains(tsource, r'^(google|bing)$') 
10            and regexp_contains(medium, r'^(.*cp.*|ppc|paid.*)$') 
11            then 'search_paid'
12        when regexp_contains(tsource, r'^(twitter|facebook|fb|instagram|ig|linkedin|pinterest)$')
13            and regexp_contains(medium, r'^(.*cp.*|ppc|paid.*|social_paid)$') 
14            then 'social_paid'
15        when regexp_contains(tsource, r'^(youtube)$')
16            and regexp_contains(medium, r'^(.*cp.*|ppc|paid.*)$') 
17            then 'video_paid'
18        when regexp_contains(medium, r'^(display|banner|expandable|interstitial|cpm)$') 
19            then 'display'
20        when regexp_contains(medium, r'^(.*cp.*|ppc|paid.*)$') 
21            then 'other_paid'
22        when regexp_contains(medium, r'^(.*shop.*)$') 
23            then 'shopping_organic'
24        when regexp_contains(tsource, r'^.*(twitter|t\.co|facebook|instagram|linkedin|lnkd\.in|pinterest).*') 
25            or regexp_contains(medium, r'^(social|social_advertising|social-advertising|social_network|social-network|social_media|social-media|sm|social-unpaid|social_unpaid)$') 
26            then 'social_organic'
27        when regexp_contains(medium, r'^(.*video.*)$') 
28            then 'video_organic'
29        when regexp_contains(tsource, r'^(google|bing|yahoo|baidu|duckduckgo|yandex|ask)$') 
30            or medium = 'organic'
31            then 'search_organic'
32        when regexp_contains(tsource, r'^(email|mail|e-mail|e_mail|e mail|mail\.google\.com)$') 
33            or regexp_contains(medium, r'^(email|mail|e-mail|e_mail|e mail)$') 
34            then 'email'
35        when regexp_contains(medium, r'^(affiliate|affiliates)$') 
36            then 'affiliate'
37        when medium = 'referral'
38            then 'referral'
39        when medium = 'audio' 
40            then 'audio'
41        when medium = 'sms'
42            then 'sms'
43        when ends_with(medium, 'push')
44            or regexp_contains(medium, r'.*(mobile|notification).*') 
45            then 'mobile_push'
46        else '(other)'
47    end
48);

How to use and test the function

When the function is saved, you can use the function this way:

1select
2    <your-dataset>.channel_grouping(source, medium, campaign) aschannel_grouping
3    -- or
4    `<your-project>.<your-dataset>.channel_grouping`(source, medium, campaign) as channel_grouping
5from
6     <your-table>

Make sure you test your function when you are creating the grouping rules. You could use the following query to check what traffic parameters are mapped to which channel:

Google Analytics 4 currently has some quirks and unknowns to deal with. Related to traffic acquisition parameters, sessions are not reset when a user is switching from channel within an active session. Next to that it's not clear how GA4 is generating / attributing the Traffic Acquisition reports in the interface. Also, Google Analytics 4 uses conversion modelling to attribute conversions (due to tracking mechanisms such as Apple ITP). So currently it's impossible to align your BigQuery data with these interface reports. However, this is not a problem when defining your channels.

1with
2-- select session in last 30 days
3sessions as (
4    select
5        user_pseudo_id as ga_client_id,
6        concat(user_pseudo_id,'.',(select cast(value.int_value as string) from unnest(event_params) where key = 'ga_session_id')) as session_id, -- combine user_pseudo_id and session_id for a unique session-id
7        timestamp_micros(min(event_timestamp)) as session_start,
8        array_agg(
9            if(event_name in('page_view','user_engagement','scroll'), struct(
10                event_timestamp,
11                lower((select value.string_value from unnest(event_params) where key = 'source')) as source,
12                lower((select value.string_value from unnest(event_params) where key = 'medium')) as medium,
13                lower((select value.string_value from unnest(event_params) where key = 'campaign')) as campaign,
14                (select value.int_value from unnest(event_params) where key = 'entrances') as is_entrance,
15                (select value.int_value from unnest(event_params) where key = 'ignore_referrer') as ignore_referrer
16            ), null) 
17        ignore nulls) as channels_in_session,
18        countif(event_name = 'purchase') as conversions,
19        sum(ecommerce.purchase_revenue) as conversion_value
20    from
21        `<your-project>.analytics_<your-dataset>.events_*`
22    where
23        _table_suffix between 
24            format_date('%Y%m%d', date_sub(current_date(), interval 30 day))
25            and format_date('%Y%m%d', date_sub(current_date(), interval 1 day))
26    group by
27        user_pseudo_id,
28        session_id
29),
30-- get first campaign parameters from session and aggegrated metrics
31traffic_acquisition as (
32    select
33        (select t.source from unnest(channels_in_session) as t where t.ignore_referrer is null order by t.event_timestamp asc limit 1) as source,
34        (select t.medium from unnest(channels_in_session) as t where t.ignore_referrer is null order by t.event_timestamp asc limit 1) as medium,
35        count(distinct session_id) as sessions,
36        sum(conversions) as conversions,
37        ifnull(sum(conversion_value), 0) as conversion_value
38    from
39        sessions
40    group by
41        1, 2
42)
43
44-- map source / medium to channel grouping using a user defined function (ignore campaign)
45select
46    *,
47    <your-dataset>.channel_grouping(source, medium, null) as channel_grouping
48from
49    traffic_acquisition

channel-grouping-test.PNG Check your rules.

Some last tips

  • Create a dedicated dataset for your UDF functions
  • When you rerun the SQL, the UDF function will update.
  • Create different channel grouping functions, based on the needs (using different levels of granularity for example)

For more information on UDFs within BigQuery, see the Google Documentation.

Did you like this article? Stay hydrated and get notified when new articles, tutorials or other interesting updates are published. No spam and you can unsubscribe at any time. You can also check my Twitter account for updates and other interesting material.