Google Analytics 4 & BigQuery: Create custom Channel Groupings in a reusable SQL function
Make your life easier and define custom Channel Groupings in a reusable function (UDF) when working with Google Analytics 4 data in BigQuery. The channel grouping definition based on GA4 parameters source, medium and campaign can be reused with ease, without duplicating all the channel grouping logic in every query (win!)
- The function is saved into BigQuery as a persistent User Defined Function (UDF)
- The channel grouping logic can be edited in one place and changes will directly affect all the queries where the UDF function is used.
- Examples provided for both the Universal Analytics as the Google Analytics 4 default channel grouping definitions, which you can customize to your needs.
Update 2021-11-17: Google released a new Default Channel Grouping definition for Google Analytics 4. I've tried to capture this new GA4 definition in a BigQuery equivalent.
Defining Channel Groupings
Channel groupings are a logical combination of acquisition parameters (like
campaign) to classify your traffic into channels. In Google Analytics, the parameters mentioned are populated based on URL parameters (like
?utm_x=...) or the auto tagging link between GA and Google Ads or Campaign. More information about channel groupings in the Google documentation.
The definitions of the channels should depend on your specific business needs. Next to the Default Channel Grouping definition of Google Analytics, you can create your own custom channel grouping(s). As mentioned, mappings can be made based on dimensions
campaign, but you can add more parameters to the function easily.
In the first SQL example below, we've provided a generic definition which looks a lot like the Default Channel Grouping of Google Universal Analytics.
- See this page for the Default Channel Grouping definition within the GA interface.
- Google Analytics 4 does not provide the option (yet) to create custom channels groupings in the interface. Universal Analytics does, but it doesn't export these groupings to BigQuery (and if they did, you can't alter historical data).
- The Google Analytics 4 Default Channel Grouping definition differs from Universal Analytics.
- The functions below can be used for both the Google Analytics 4 and Universal Analytics 360 exports.
We highly recommend to customize your channel grouping based on your business needs. Therefore it's important to get your UTM tagging in order, so you can add more depth in to your mappings.
To illustrate, you could create the following additional rules in your groupings:
- Paid search: branded vs. non-branded campaigns (based on the
campaignparameter for example)
- Social: organic social vs. paid social (make sure you ?utm= tag your social media campaigns accordingly)
- Email: Transactional emails vs. marketing related emails
You could also throw the
traffic source into the mix. This can be useful when you want to slim down all the unique "source / medium" combinations. Of course, you can create multiple functions for different levels of depth.
The BigQuery functions
Let's talk action. You can run the SQL function below directly in the BigQuery interface and it will automatically create the function within the dataset your specified in the
CREATE OR REPLACE FUNCTION. It will show up in the tree on the left side of your screen;
Example 1 - Default Channel Grouping UA
This is the BigQuery equivalent of the Google Universal Analytics default channel grouping definition.
1create or replace function `<your-project>.<your-dataset>.channel_grouping`(tsource string, medium string, campaign string) as ( 2 case 3 when (tsource = '(direct)' or tsource is null) 4 and (regexp_contains(medium, r'^(\(not set\)|\(none\))$') or medium is null) 5 then 'direct' 6 when regexp_contains(medium, r'^(social|social_advertising|social-advertising|social_network|social-network|social_media|social-media)$') 7 then 'social' 8 when regexp_contains(medium, r'^(email|mail)$') 9 then 'email' 10 when regexp_contains(medium, r'^(affiliate|affiliates)$') 11 then 'affiliate' 12 when regexp_contains(medium, r'^(cpc|ppc|paidsearch)$') 13 then 'search_paid' 14 when regexp_contains(medium, r'^(display|cpm|banner)$') 15 then 'display' 16 when medium = 'organic' 17 then 'search_organic' 18 when medium = 'referral' 19 then 'referral' 20 else '(other)' 21 end 22);
Example 2 - Default Channel Grouping GA4
2021-11-17: Google recently updated the default channel grouping definition of Google Analytics 4, so I've replaced the old advanced example with the GA4 example, since there is a lot of overlap (the GA4 definition has mush more dept than the Universal Analytics one).
Be aware that not all the rules applied in GA4 can be used in BigQuery since we do not have the exact definitions of rules like "Source matches a list of social sites". Please let me know if you have any suggestions to further tweak the function.
1create or replace function `<your-project>.<your-dataset>.channel_grouping`(tsource string, medium string, campaign string) as ( 2 case 3 when (tsource = 'direct' or tsource is null) 4 and (regexp_contains(medium, r'^(\(not set\)|\(none\))$') or medium is null) 5 then 'direct' 6 when regexp_contains(campaign, r'^(.*shop.*)$') 7 and regexp_contains(medium, r'^(.*cp.*|ppc|paid.*)$') 8 then 'shopping_paid' 9 when regexp_contains(tsource, r'^(google|bing)$') 10 and regexp_contains(medium, r'^(.*cp.*|ppc|paid.*)$') 11 then 'search_paid' 12 when regexp_contains(tsource, r'^(twitter|facebook|fb|instagram|ig|linkedin|pinterest)$') 13 and regexp_contains(medium, r'^(.*cp.*|ppc|paid.*|social_paid)$') 14 then 'social_paid' 15 when regexp_contains(tsource, r'^(youtube)$') 16 and regexp_contains(medium, r'^(.*cp.*|ppc|paid.*)$') 17 then 'video_paid' 18 when regexp_contains(medium, r'^(display|banner|expandable|interstitial|cpm)$') 19 then 'display' 20 when regexp_contains(medium, r'^(.*cp.*|ppc|paid.*)$') 21 then 'other_paid' 22 when regexp_contains(medium, r'^(.*shop.*)$') 23 then 'shopping_organic' 24 when regexp_contains(tsource, r'^.*(twitter|t\.co|facebook|instagram|linkedin|lnkd\.in|pinterest).*') 25 or regexp_contains(medium, r'^(social|social_advertising|social-advertising|social_network|social-network|social_media|social-media|sm|social-unpaid|social_unpaid)$') 26 then 'social_organic' 27 when regexp_contains(medium, r'^(.*video.*)$') 28 then 'video_organic' 29 when regexp_contains(tsource, r'^(google|bing|yahoo|baidu|duckduckgo|yandex|ask)$') 30 or medium = 'organic' 31 then 'search_organic' 32 when regexp_contains(tsource, r'^(email|mail|e-mail|e_mail|e mail|mail\.google\.com)$') 33 or regexp_contains(medium, r'^(email|mail|e-mail|e_mail|e mail)$') 34 then 'email' 35 when regexp_contains(medium, r'^(affiliate|affiliates)$') 36 then 'affiliate' 37 when medium = 'referral' 38 then 'referral' 39 when medium = 'audio' 40 then 'audio' 41 when medium = 'sms' 42 then 'sms' 43 when ends_with(medium, 'push') 44 or regexp_contains(medium, r'.*(mobile|notification).*') 45 then 'mobile_push' 46 else '(other)' 47 end 48);
How to use and test the function
When the function is saved, you can use the function this way:
1select 2 <your-dataset>.channel_grouping(source, medium, campaign) aschannel_grouping 3 -- or 4 `<your-project>.<your-dataset>.channel_grouping`(source, medium, campaign) as channel_grouping 5from 6 <your-table>
Make sure you test your function when you are creating the grouping rules. You could use the following query to check what traffic parameters are mapped to which channel:
Google Analytics 4 currently has some quirks and unknowns to deal with. Related to traffic acquisition parameters, sessions are not reset when a user is switching from channel within an active session. Next to that it's not clear how GA4 is generating / attributing the Traffic Acquisition reports in the interface. Also, Google Analytics 4 uses conversion modelling to attribute conversions (due to tracking mechanisms such as Apple ITP). So currently it's impossible to align your BigQuery data with these interface reports. However, this is not a problem when defining your channels.
1with 2-- select session in last 30 days 3sessions as ( 4 select 5 user_pseudo_id as ga_client_id, 6 concat(user_pseudo_id,'.',(select cast(value.int_value as string) from unnest(event_params) where key = 'ga_session_id')) as session_id, -- combine user_pseudo_id and session_id for a unique session-id 7 timestamp_micros(min(event_timestamp)) as session_start, 8 array_agg( 9 if(event_name in('page_view','user_engagement','scroll'), struct( 10 event_timestamp, 11 lower((select value.string_value from unnest(event_params) where key = 'source')) as source, 12 lower((select value.string_value from unnest(event_params) where key = 'medium')) as medium, 13 lower((select value.string_value from unnest(event_params) where key = 'campaign')) as campaign, 14 (select value.int_value from unnest(event_params) where key = 'entrances') as is_entrance, 15 (select value.int_value from unnest(event_params) where key = 'ignore_referrer') as ignore_referrer 16 ), null) 17 ignore nulls) as channels_in_session, 18 countif(event_name = 'purchase') as conversions, 19 sum(ecommerce.purchase_revenue) as conversion_value 20 from 21 `<your-project>.analytics_<your-dataset>.events_*` 22 where 23 _table_suffix between 24 format_date('%Y%m%d', date_sub(current_date(), interval 30 day)) 25 and format_date('%Y%m%d', date_sub(current_date(), interval 1 day)) 26 group by 27 user_pseudo_id, 28 session_id 29), 30-- get first campaign parameters from session and aggegrated metrics 31traffic_acquisition as ( 32 select 33 (select t.source from unnest(channels_in_session) as t where t.ignore_referrer is null order by t.event_timestamp asc limit 1) as source, 34 (select t.medium from unnest(channels_in_session) as t where t.ignore_referrer is null order by t.event_timestamp asc limit 1) as medium, 35 count(distinct session_id) as sessions, 36 sum(conversions) as conversions, 37 ifnull(sum(conversion_value), 0) as conversion_value 38 from 39 sessions 40 group by 41 1, 2 42) 43 44-- map source / medium to channel grouping using a user defined function (ignore campaign) 45select 46 *, 47 <your-dataset>.channel_grouping(source, medium, null) as channel_grouping 48from 49 traffic_acquisition
Check your rules.
Some last tips
- Create a dedicated dataset for your UDF functions
- When you rerun the SQL, the UDF function will update.
- Create different channel grouping functions, based on the needs (using different levels of granularity for example)
For more information on UDFs within BigQuery, see the Google Documentation.