Nomad Studio TL - Imagination is your only limit - Est. 1986

> nomadstud.io / blog / blogging about online marketing and advertising

How to build a marketing data warehouse with BigQuery?

While data warehouses used to be a luxury, expensive to put in place and time consuming to maintain, cloud based solutions such as Google BigQuery or Amazon S3 have changed the game: nowadays big companies or agencies are no longer the only ones which can afford them. Let's see what's the opportunity from a digital marketing reporting point of view.

In this article, we are going to take the business case of a company running international campaigns with multiple ad accounts across different platforms (paid search, display, paid social) which need to report on their paid media performance globally.

Once we have created our project in BigQuery, we want to collect the data from these ad accounts and store it inside tables. There are different ways to achieve this.

One can consist in writting custom API (in JavaScript for example) that will request performance reports from the ad platforms on a daily basis and store the results in Google Sheets. These GSheets can then be used as data sources in BigQuery (it's possible to create tables based on that) or we can complete the script with an instruction to send the data from the GSheet to a BigQuery table directly.

This approach is a bit complicated because you'll need to write a script for each ad platform and ensure that each one does the job properly. For more efficiency, I recommend using a tool like Funnel.io connector to automate these tasks for you. It will act as a bridge between your ad platforms and your data warehouse in BigQuery.

No matter what approach, we want to end up with an aggregate of our raw data grouped by date. And to make it more organized, we are going to split it by creating one table per month. Each table will contain the dimensions and metrics that have been pulled from our ad accounts (campaign names, number of impressions, clicks, total spend, conversions...).

From this point onward, everything we are going to build will consist in virtual tables (also called 'views'). Now, if our campaigns have been following a consitent naming convention (see previous article Why your digital advertising strategy should include a naming convention?), let's leverage it by exctracting additionnal dimensions.

Considering our campaigns names follow this conventions:
Country-Channel-Subchannel-Theme-ActivityType-Strategy-StartDate

At this stage our campaign column concatenates all this fields:

campaign impressions clicks
FR-PPC-Google-SportShoes-AW-P-201110 233459 16210
FR-SMA-Facebook-SportShoes-AW-P-201110 334409 10223
FR-SMA-LinkedIn-SportShoes-AW-P-201110 400640 22034

So we need to write a SQL query in order to separate them. Here is how it looks:

REGEXP_EXTRACT(campaign,r'(.*?)-') AS country,
REGEXP_EXTRACT(campaign,r'(?:.*?)-(.*?)-') AS channel,
REGEXP_EXTRACT(campaign,r'(?:.*?)-(?:.*?)-(.*?)-') AS subchannel,
REGEXP_EXTRACT(campaign,r'(?:.*?)-(?:.*?)-(?:.*?)-(.*?)-') AS theme,
REGEXP_EXTRACT(campaign,r'(?:.*?)-(?:.*?)-(?:.*?)-(?:.*?)-(.*?)-') AS activity_type,
REGEXP_EXTRACT(campaign,r'(?:.*?)-(?:.*?)-(?:.*?)-(?:.*?)-(?:.*?)-(.*?)-') AS strategy

Running this query for each type of data source (ad platform), we'll end up with new tables where each of these fields is a a dimension with its own column.

country channel subchannel theme activity_type strategy
FR PPC Google SportShoes AW P

We'll save these results as SQL views (virtual tables), creating one view per ad platform in order to make troubleshooting easier in the future. Then we'll combine all platforms views unsing an UNION ALL statement as follows:

SELECT /* for Google Ads */
country,
channel,
subchannel,
theme,
activity_type,
strategy

FROM bigquery-project.dataset_name.platform_googleads
UNION ALL

SELECT /* for Facebook */
country,
channel,
subchannel,
theme,
activity_type,
strategy

FROM bigquery-project.dataset_name.platform_facebook
UNION ALL

SELECT /* for LinkedIn */
country,
channel,
subchannel,
theme,
activity_type,
strategy

FROM bigquery-project.dataset_name.platform_linkedin

Here is the result or this query:

country channel subchannel theme activity_type strategy
FR PPC Google SportShoes AW P
FR SMA Facebook SportShoes AW P
FR SMA LinkedIn SportShoes AW P

This new virtual table in itself already constitutes an interesting data set that can be used for data visualization. It shall be used as the foundation of the other virtual tables you will need to meet the need of your teams.