How to build a marketing data warehouse with BigQuery?
While data warehouses used to be a luxury, expensive to put in place and time consuming to maintain, cloud based solutions such as Google BigQuery or Amazon S3 have changed the game: nowadays big companies or agencies are no longer the only ones which can afford them. Let's see what's the opportunity from a digital marketing reporting point of view.
In this article, we are going to take the business case of a company running international campaigns with multiple ad accounts across different platforms (paid search, display, paid social) which need to report on their paid media performance globally.
Once we have created our project in BigQuery, we want to collect the data from these ad accounts and store it inside tables. There are different ways to achieve this.
This approach is a bit complicated because you'll need to write a script for each ad platform and ensure that each one does the job properly. For more efficiency, I recommend using a tool like Funnel.io connector to automate these tasks for you. It will act as a bridge between your ad platforms and your data warehouse in BigQuery.
No matter what approach, we want to end up with an aggregate of our raw data grouped by date. And to make it more organized, we are going to split it by creating one table per month. Each table will contain the dimensions and metrics that have been pulled from our ad accounts (campaign names, number of impressions, clicks, total spend, conversions...).
From this point onward, everything we are going to build will consist in virtual tables (also called 'views'). Now, if our campaigns have been following a consitent naming convention (see previous article Why your digital advertising strategy should include a naming convention?), let's leverage it by exctracting additionnal dimensions.
Considering our campaigns names follow this conventions:
At this stage our campaign column concatenates all this fields:
So we need to write a SQL query in order to separate them. Here is how it looks:
REGEXP_EXTRACT(campaign,r'(.*?)-') AS country,
REGEXP_EXTRACT(campaign,r'(?:.*?)-(.*?)-') AS channel,
REGEXP_EXTRACT(campaign,r'(?:.*?)-(?:.*?)-(.*?)-') AS subchannel,
REGEXP_EXTRACT(campaign,r'(?:.*?)-(?:.*?)-(?:.*?)-(.*?)-') AS theme,
REGEXP_EXTRACT(campaign,r'(?:.*?)-(?:.*?)-(?:.*?)-(?:.*?)-(.*?)-') AS activity_type,
REGEXP_EXTRACT(campaign,r'(?:.*?)-(?:.*?)-(?:.*?)-(?:.*?)-(?:.*?)-(.*?)-') AS strategy
Running this query for each type of data source (ad platform), we'll end up with new tables where each of these fields is a a dimension with its own column.
We'll save these results as SQL views (virtual tables), creating one view per ad platform in order to make troubleshooting easier in the future. Then we'll combine all platforms views unsing an UNION ALL statement as follows:
SELECT /* for Google Ads */
SELECT /* for Facebook */
SELECT /* for LinkedIn */
Here is the result or this query:
This new virtual table in itself already constitutes an interesting data set that can be used for data visualization. It shall be used as the foundation of the other virtual tables you will need to meet the need of your teams.