1. Introduction
Some Context Before We Start
If you are an iOS App developer, you must have heard of the iOS 14.5+ privacy updates. To measure meaningful conversion actions post install, Apple provides the SKAd Network API which allows you to measure the success of your ad campaigns while respecting user privacy. Based on your business needs, you can come up with the most optimal way to leverage SKAd Network to capture meaningful insights about your campaigns. In this codelab, we examine an example methodology to leverage your GA4F data in BigQuery to group revenue post app installation into buckets, which you can then use to set up with your app attribution partner. While this codelab uses a revenue based approach, you can also use events or funnel based approaches for SKAN measurement. Please refer to this help center for more detailed guidance. This is just an example, not an official Google recommendation. You can design your own schema based on your specific business needs
What we intend to cover
- Explore GA4F data in BigQuery
- Find revenue data for users who converted within 0-2 days
- Group revenue data into buckets
- Understand user distribution in each bucket
- Implement the buckets in Appsflyer SKAN Conversion Studio
Pre-Requisites
- GA4F SDK in your iOS App, and all revenue events integrated (in_app_purchase or ad funded revenue)
- Firebase to BigQuery export enabled
- App Attribution Partner, which is also recording all revenue events
2. Accessing BigQuery Export
Navigate to the Google Cloud Dataset
Navigate to the dataset in GA4F by visiting Project Settings > Integrations > BigQuery. The toggle needs to be enabled first and once its enabled, it takes around 48 hours for the dataset to be available. You can click on the link shown below and it will take you to BigQuery
Run some queries
Now that you are in BigQuery, you should see the daily tables generated. In the below example screenshot, we see 64 daily tables, so the export has been running for 64 days. If you are accessing it for the first time, you might just see 1 daily table for the previous day's data. On the right, you see the table schema. You can refer to more details on the fields here
In order to start writing your query, you can click on Query > In new tab
You can then try running the sample query in the new tab
3. Analyze revenue data
Fetching install data
Now In order to start building the revenue buckets, we have to first look at data for users who installed in the app within the last 24 to 72 hours. SKAd Network 4.0 lets you view data in 0-2 days, while SKAd Network 3.5 allows 24 hours by default. (Depending on the capabilities of your App Attribution Partner, you might be able to modify this activity window generally to not more than 72 hours). When users install the app and open it for the first time, the first_open event is fired by the SDK and recorded in BigQuery.
The identifier that you can use for BigQuery is user_pseudo_id (also called app instance ID), so you can use the below query to find these users
SELECT
user_pseudo_id,
event_name,
event_date,
event_timestamp
FROM `project_name.dataset_name.events_2023*`
WHERE
event_name = 'first_open'
AND platform = 'IOS'
A couple of things to note about this query
- Please replace the table name with your Analytics exported table. You can use wildcards to query multiple daily tables. For example, 2023* will query across all data in 2023
- If you have a lot of users, you can also query just the last 30 days for faster processing
- We filter on platform = ‘IOS'. In case you have multiple iOS Apps in your Firebase project, you can also add a filter for app_info.firebase_app_id to get data for the specific app
Fetching revenue data
Now, let's look at a query to find revenue for your users. In this case, we would assume that your revenue events are in_app_purchase and ad_impression. The revenue from in_app_purchase is available in event_value_usd, while for ad_impression, the revenue is available in the value parameter, within the event parameters. If you are not familiar with event parameters in BigQuery, would recommend checking the definition here, and you can try out this sample query in our official reference, which also covers extracting the value from event_params
SELECT
user_pseudo_id,
event_name,
EXTRACT(date FROM Parse_datetime('%Y%m%d', event_date)) AS event_date,
(
SELECT COALESCE(value.int_value, value.float_value, value.double_value, NULL)
FROM UNNEST(event_params)
WHERE
KEY = 'value'
AND event_name = 'ad_impression'
) AS ad_funded_revenue,
(
SELECT value.string_value
FROM UNNEST(event_params)
WHERE
KEY = 'currency'
AND event_name = 'ad_impression'
) AS ad_revenue_currency,
(
CASE
WHEN event_name = 'in_app_purchase' THEN event_value_in_usd
ELSE 0
END) AS iap_revenue_usd,
FROM `project_name.dataset_name.events_2023*`
WHERE
platform = 'IOS'
AND event_name IN (
'in_app_purchase',
'ad_impression')
Let's understand what the query is doing here. These are the things you would notice
- In the WHERE clause, we are filtering for the revenue events, since we are only interested in those, and like last time, we are looking for iOS data
- Now, in the SELECT clause, we are taking the value as well as the currency for the ad revenue event (ad_impression), and we are taking the event_value_in_usd when the event is in_app_purchase
- If in case you are sending multiple currencies, you shall first need to align to a single currency for this analysis. For the purposes of this example, we shall assume that the currency for ad funded revenue is also USD
The output would be something like the below (the column for user_pseudo_id is redacted here).
Combining this data
Till now, we have run two queries, one to find the data for the users who installed and opened the app, and another one to find the revenue for those users. Now, let's remember what we discussed about SKAd Network limitations. The attribution window can only be available within 0-2 days post installation. Hence, we shall need to check the event timestamps for install and revenue, and only take the information, if it happens within that time frame. Let's now take a shot at combining into a query that provides the total revenue for each post two days of app installation
#creating the install table
WITH
install_table AS (
SELECT
user_pseudo_id,
event_name,
event_date,
event_timestamp
FROM `project_name.dataset_name.events_2023*`
WHERE
event_name = 'first_open'
AND platform = 'IOS'
),
#creating the revenue table
revenue_table AS (
SELECT
user_pseudo_id,
event_name,
event_timestamp,
EXTRACT(date FROM Parse_datetime('%Y%m%d', event_date)) AS event_date,
(
SELECT COALESCE(value.int_value, value.float_value, value.double_value, NULL)
FROM UNNEST(event_params)
WHERE
KEY = 'value'
AND event_name = 'ad_impression'
) AS ad_funded_revenue,
(
SELECT value.string_value
FROM UNNEST(event_params)
WHERE
KEY = 'currency'
AND event_name = 'ad_impression'
) AS ad_revenue_currency,
(
CASE
WHEN event_name = 'in_app_purchase' THEN event_value_in_usd
ELSE 0
END) AS iap_revenue_usd,
FROM `project_name.dataset_name.events_2023*`
WHERE
platform = 'IOS'
AND event_name IN (
'in_app_purchase',
'ad_impression')
)
SELECT
it.user_pseudo_id AS user_pseudo_id,
#combine ad revenue and IAP revenue, assuming both are in same currency
sum(ifnull(rt.iap_revenue_usd,0) + ifnull(rt.ad_funded_revenue,0)) AS total_revenue,
FROM install_table it
INNER JOIN revenue_table rt
ON it.user_pseudo_id = rt.user_pseudo_id
WHERE
rt.event_timestamp >= it.event_timestamp
AND rt.event_timestamp
<= it.event_timestamp + 86400000000 * 2 #added 86400 000 millisecond as 24 hours, taking for 2 days later
GROUP BY 1
The query just tries to join the install data and the revenue data on the user_pseudo_id field, and then we have to ensure that the timestamp is within 2 days. If you are using SKAd Network 3.5, the default is 24 hours, so you can also change the condition to only include 1 day's data
Grouping the revenue into buckets
After the previous query, you shall have the user_pseudo_id and the total revenue
We shall now need to combine this into buckets that we can use for our conversion value ranges. For this purpose, we shall use the approx_quantiles function in BigQuery, which automatically creates these ranges for you. Let's assume for the purposes of this example that we want to create 5 ranges, so we can just use SELECT approx_quantiles(total_revenue, 5) AS buckets
With that, let's incorporate this into our overall query
#creating the install table
WITH
install_table AS (
SELECT
user_pseudo_id,
event_name,
event_date,
event_timestamp
FROM `project_name.dataset_name.events_2023*`
WHERE
event_name = 'first_open'
AND platform = 'IOS'
),
#creating the revenue table
revenue_table AS (
SELECT
user_pseudo_id,
event_name,
event_timestamp,
EXTRACT(date FROM Parse_datetime('%Y%m%d', event_date)) AS event_date,
(
SELECT COALESCE(value.int_value, value.float_value, value.double_value, NULL)
FROM UNNEST(event_params)
WHERE
KEY = 'value'
AND event_name = 'ad_impression'
) AS ad_funded_revenue,
(
SELECT value.string_value
FROM UNNEST(event_params)
WHERE
KEY = 'currency'
AND event_name = 'ad_impression'
) AS ad_revenue_currency,
(
CASE
WHEN event_name = 'in_app_purchase' THEN event_value_in_usd
ELSE 0
END) AS iap_revenue_usd,
FROM `project_name.dataset_name.events_2023*`
WHERE
platform = 'IOS'
AND event_name IN (
'in_app_purchase',
'ad_impression')
),
total_revenue_table AS (
SELECT
it.user_pseudo_id AS user_pseudo_id,
#combine ad revenue and IAP revenue, assuming both are in same currency
sum(ifnull(rt.iap_revenue_usd,0) + ifnull(rt.ad_funded_revenue,0)) AS total_revenue,
FROM install_table it
INNER JOIN revenue_table rt
ON it.user_pseudo_id = rt.user_pseudo_id
WHERE
rt.event_timestamp >= it.event_timestamp
AND rt.event_timestamp
<= it.event_timestamp + 86400000000 * 2 #added 86400 000 millisecond as 24 hours
GROUP BY 1
)
SELECT approx_quantiles(total_revenue, 5) AS buckets FROM total_revenue_table
This query shall divide the revenue into 5 buckets and BigQuery tries to maintain a consistent percentile distribution
Analyze user distribution with these buckets
This is an optional step, if you would like to understand the distribution of your users in each bucket. For our example, the bucket ranges returned in the previous query are
- 0.1
- 0.5
- 2
- 2.5
- 5 [the last value shall not be used in the range configuration]
For the final ranges, we shall ignore the last bucket 5, as that is generally the maximum value, and we can just consider 2.5 to be the last range. This is because app attribution providers tend to calculate the ROAS using the mean of the range, so the outlier must be excluded for more uniform calculation.
We shall now try to look at the number of users for each date across all the ranges, so that we can understand the daily volume of users in each bucket.We can do that using this sample query, where you can replace the bucket values with your actual data, and the query would look something like this
#creating the install table
WITH
install_table AS (
SELECT
user_pseudo_id,
event_name,
event_date,
event_timestamp
FROM `project_name.dataset_name.events_2023*`
WHERE
event_name = 'first_open'
AND platform = 'IOS'
),
#creating the revenue table
revenue_table AS (
SELECT
user_pseudo_id,
event_name,
event_timestamp,
EXTRACT(date FROM Parse_datetime('%Y%m%d', event_date)) AS event_date,
(
SELECT COALESCE(value.int_value, value.float_value, value.double_value, NULL)
FROM UNNEST(event_params)
WHERE
KEY = 'value'
AND event_name = 'ad_impression'
) AS ad_funded_revenue,
(
SELECT value.string_value
FROM UNNEST(event_params)
WHERE
KEY = 'currency'
AND event_name = 'ad_impression'
) AS ad_revenue_currency,
(
CASE
WHEN event_name = 'in_app_purchase' THEN event_value_in_usd
ELSE 0
END) AS iap_revenue_usd,
FROM `project_name.dataset_name.events_2023*`
WHERE
platform = 'IOS'
AND event_name IN (
'in_app_purchase',
'ad_impression')
),
total_revenue_table AS (
SELECT
it.user_pseudo_id AS user_pseudo_id,
rt.event_date,
#combine ad revenue and IAP revenue, assuming both are in same currency
sum(ifnull(rt.iap_revenue_usd,0) + ifnull(rt.ad_funded_revenue,0)) AS total_revenue,
FROM install_table it
INNER JOIN revenue_table rt
ON it.user_pseudo_id = rt.user_pseudo_id
WHERE
rt.event_timestamp >= it.event_timestamp
AND rt.event_timestamp
<= it.event_timestamp + 86400000000 * 2 #added 86400 000 millisecond as 24 hours
GROUP BY 1, 2
)
SELECT
event_date,
sum(CASE WHEN total_revenue BETWEEN 0 AND 0.1 THEN 1 ELSE 0 END) AS Bucket1,
sum(CASE WHEN total_revenue BETWEEN 0.1 AND 0.5 THEN 1 ELSE 0 END) AS Bucket2,
sum(CASE WHEN total_revenue BETWEEN 0.5 AND 2 THEN 1 ELSE 0 END) AS Bucket3,
sum(CASE WHEN total_revenue BETWEEN 2 AND 2.5 THEN 1 ELSE 0 END) AS Bucket4,
sum(CASE WHEN total_revenue > 2.5 THEN 1 ELSE 0 END) AS Bucket5
FROM total_revenue_table
GROUP BY 1 ORDER BY 1 DESC
It shall return the users in each revenue range for each day, like below. If you see very low numbers in any bucket or generally uneven distribution, you might want to adjust the number of buckets and rerun the query.
A quick word on SKAd Network 4.0
SKAd Network 4.0 provides multiple conversion windows of up to 2 days, 3-7 days and 8-35 days. In the approach above, you can easily change the window to analyze data for these additional scenarios as well. Coarse grained values of LOW, MEDIUM and HIGH are also available. Again, if you want to use this approach, you can think of this as 3 buckets, So, by changing the number of buckets to 3, you can get the thresholds for LOW, MEDIUM and HIGH
4. Deployment with your attribution provider
Depending on the specific platform, this guidance might change. Please work with the platform representatives for the most updated information on this. For the purposes of this example, we shall look at how we can currently deploy this on AppsFlyer
In the query that we ran previously, the final ranges that we received as output were as below
- Range 1 : 0 to 0.1
- Range 2 : 0.1 to 0.5
- Range 3 : 0.5 to 2
- Range 4 : 2 to 2.5
Remember that we decided to ignore the last revenue range, as it will be an outlier, and skew the average calculations for your app attribution provider.
AppsFlyer offers SKAN Conversion Studio, where it is quite simple to input this directly into the UI. You can either use 4.0 directly or use "Custom" mode if you are using 3.5, and add "Revenue" measurement. You can then just add in the revenue ranges that you calculated from the earlier analysis.
Best Practices and Learnings on Google Ads
We would like to leave you with some recommendations if you are running campaigns on Google Ads, and measuring the impact through a SKAd Network conversion value schema
- Make sure that the conversion window you are using on Google Ads matches the activity window that you have specified on your App Attribution platform. For SKAd network 3.5, this is likely to be within 1-3 days, so you can adjust it accordingly on Google Ads by following the steps listed here
- If you are using Appsflyer, currently the default event counter is 1, which means that it does not account for multiple events per user. If you are using an event based model for SKAN measurement and comparing with tCPA campaigns on Google Ads, you can choose to customize by following this guidance from Appsflyer
5. Congratulations
Congratulations, you've successfully set up your SKAd Network Conversion value schema. You can now monitor the data in your Google Ads SKAd Network report to check conversion values for your Google Ads campaigns once this is live
You've learned
- How to explore the rich raw data from GA4F in BigQuery
- Analytical approach to calculate revenue buckets for your business
- Deploy the schema with AppsFlyer