Welcome to the Realtime on-device in-app-purchase optimization codelab. In this codelab you'll learn how to use TensorFlow Lite and Firebase to train and deploy a custom personalization model to your app.
This tutorial shows how to build a machine learning model for personalization, in particular one that predicts the optimal in-app-purchase offering given a state the current user is in. This is an example of a contextual bandit problem, an important and widely applicable kind of machine learning problem.
What you'll learn
- Collect analytics data via Firebase analytics
- Preprocess analytics data using BigQuery
- Train a simple ML model for on-device optimization of in-app-purchases
- Deploy TFLite models to Firebase ML and access them from your app
- Measure and experiment with different models via Firebase A/B Testing
- Train and deploy new models using latest data on a recurring cadence
What you'll need
- Android Studio version 3.4+.
- Sample code.
- A test device with Android 2.3+ and Google Play services 9.8 or later, or an Emulator with Google Play services 9.8 or later
- If using a device, a connection cable.
How will you use this tutorial?
How would you rate your experience with building Android apps?
2. Problem Statement
Let's say you are a game developer who wants to show personalized in-app-purchase (IAP) suggestions at the end of each level. You can only show a limited number of IAP options each time, and you don't know which ones will have the best conversion. Given each user and each session is different, how do we go about finding the IAP offer that gives the highest expected reward?
3. Get the sample code
Clone the GitHub repository from the command line. This repo will contain:
- A Jupyter notebook that trains the personalization model and packages it into a TFLite model.
- Sample Kotlin app that uses the TFLite model to make predictions on-device.
Download a zip archive that contains the source code for this codelab. Extract the archive in your local machine.
4. Run the app with Firebase
In this codelab, we will work on optimizing the in-app-purchases of our fictional game app - Flappy Sparky. The game is a side-scroller where the player controls a Sparky, attempting to fly between columns of walls without hitting them. At the beginning of the level, the user is presented with an IAP offer that will give them a powerup. We'll only be implementing the IAP optimization portion of the app in this codelab.
You will be able to apply what you learn here to your own app that's connected to a Firebase project. If you need help getting started with Firebase, please see our tutorials on this topic. ( Android and iOS)
5. Collect analytics events in your app
We'll start by collecting analytics events that will be used to train our model. The model might learn, for example, that users that have been playing for longer are more likely to prefer certain types of content. Therefore we need to collect these information about the user as well as which types of content they prefer.
In this example, we want to know how long the user has been playing the game, how far they usually reach, and how many coins they have spent, etc...
Download sample data (Optional)
You can skip the data collection part and jump to the "Train the optimization model" section of this codelab and use our sample data to follow along.
Collect Data with Firebase Analytics SDK
We will use Firebase Analytics to help collect these analytics events. The SDK automatically captures a number of events and user properties and also allows you to define your own custom events to measure the things that uniquely matter to your app.
Installing Firebase Analytics SDK
You can get started with Firebase Analytics by following these tutorials:
Log custom events
After setting up Firebase Analytics SDK, we can start instrumenting the events we need to train our model.
It's important to set a user ID in the analytics event, so we can associate analytics data for that user with their existing data in the app.
In this case, we need to log each in-app-purchase (IAP) offer presented to the user and whether that offer is clicked on by the user. This will give us two analytics events -
"accepted_offers". We'll keep track of a unique offer_uid so we can use it later to combine these data to see if an offer is accepted.
6. Preprocess data in BigQuery
After collecting the analytics events, we will have collected events about which IAP offer is presented to the user, and which IAP offer is clicked on by the user. But we need to combine this data together with data about the user so our model can learn from a complete picture.
To do this we will need to start by exporting the analytics events to BigQuery.
Link your Firebase project to BigQuery
To link your Firebase project and its apps to BigQuery:
- Sign in to Firebase.
- Click , then select Project Settings.
- On the Project Settings page, click the Integrations tab.
- On the BigQuery card, click Link.
(Optional) Export your Firestore collections to BigQuery
If you have data about the user in Firestore, such as users' signup date, in-app purchases made, levels in the game, coins in balance, or any other attributes that might be useful in training the model, you can also combine those data to use in your personalization model.
To export your Firestore collections to BigQuery, you can install the Firestore BigQuery Export Extension.
Preparing data in BigQuery
In order for our model to learn which IAP offer to present based on the user, the game state, we need to organize data about the user, the game state, the offer presented and whether it's clicked on all into a single row in a table. In the next few steps, we will use BigQuery to transform our raw analytics data into data usable for training our model.
BigQuery allows creating "views" to keep your query organized. A view is a virtual table defined by a SQL query. When you create a view, you query it in the same way you query a table. Using this we can first clean our analytics data.
To see if each in-app-purchase offer is clicked on, we will need to join the ad_offers and ad_accepts events that we logged in the previous step.
all_offers_joined - BigQuery view
SELECT iap_offers.*, CASE WHEN accepted_offers.accepted IS NULL THEN FALSE ELSE TRUE END is_clicked, FROM `iap-optimization.ml_sample.accepted_offers` AS accepted_offers RIGHT JOIN `iap-optimization.ml_sample.iap_offers` AS iap_offers ON accepted_offers.offer_id =iap_offers.offer_id;
all_offers_with_user_data - BigQuery view
SELECT offers.is_clicked, offers.presented_powerup, offers.last_run_end_reason, offers.event_timestamp, users.* FROM `iap-optimization.ml_sample.all_offers_joined` AS offers LEFT JOIN `iap-optimization.ml_sample.all_users` AS users ON users.user_id = offers.user_id;
Export bigQuery dataset to Google Cloud Storage
Lastly, we can export the bigquery dataset to GCS so we can use it in our model training.
7. Train the optimization model
Sample data can be found here. This is also the data we will be using for training the model in this section.
Before we start training the model, let's spend some time defining our contextual bandits problem.
Contextual bandits explained
At the beginning of each level in Flappy Sparky, the user is presented with an IAP offer that will give them a powerup. We can only show one IAP option each time, and we don't know which ones will have the best conversion. Given each user and each session is different, how do we go about finding the IAP offer that gives the highest expected reward?
In this case, let's make the reward 0 if the user doesn't accept the IAP offer, and the IAP value if they do. To try to maximize your reward, we can use our historical data to train a model that predicts the expected reward for each action given a user, and find the action with the highest reward.
The following is what we will use in the prediction:
- State: information about the user and their current session
- Action: IAP offers we can choose to show
- Reward: value of the IAP offer
Exploitation vs Exploration
For all multi-armed bandits problems, the agent needs to balance between exploration (getting more data to learn which action gives the optimal result) and exploitation (using the optimal result to obtain the highest reward).
In our version of the problem, we will simplify this to only train the model periodically in the cloud and only do predictions when using the model on the user's device (as opposed to training on the user's device as well). To make sure we have sufficient training data after we use the model, we'll need to show randomized results to our app users sometimes (e.g. 30%). This strategy of balancing exploration and exploitation is called Epsilon-greedy.
Training the model
You can use the training script provided with the codelab to get started. Our goal is to train a model that predicts the expected rewards for each action given a state, then we find the action that gives us the highest expected rewards.
The easiest way to get started with training your own model is to make a copy of the notebook in the code sample for this codelab.
You don't need a GPU for this codelab, but if you need a more powerful machine to explore your own data and train your own model, you can get an AI Platform Notebook instance to speed up your training.
In the training script provided, we created an iterator that generates training data from the CSV files we exported from BigQuery. Then we used the data to start training our model with Keras. Details of how to train the model can be found in the comments of the Python notebook.
Measure the model performance
While training the model, we will compare it against a random agent that selects iap offers randomly to see if our model is actually learning. This logic lives under
At the end of training, we use data in test.csv to test our model again. The model has never seen these data before, so we can be confident the result is not due to overfitting. In this case, the model performs 28% better than the random agent.
Export the TFLite model
Now we have a trained model ready to use, except it's currently in a TensorFlow format. We'll need to export the model as TFLite format so it can be run on mobile devices.
converter = tflite.TFLiteConverter.from_keras_model(model) tflite_model = converter.convert() with tf.io.gfile.GFile('iap-optimizer.tflite', 'wb') as f: f.write(tflite_model)
From here, you can download the model and bundle the model with your app.
For a production app, we recommend that you deploy the model to Firebase ML and have Firebase host your model. This is useful for two main reasons:
- We can keep the app install size small and only download the model if needed
- The model can be updated regularly and with a different release cycle than the entire app
To learn how to deploy the model to Firebase ML, you can follow this codelab here. You have the option of deploying using the Firebase console or the Python API.
8. Making predictions on-device
The next step is to make predictions using the model on-device. You can find an example app that downloads a model from Firebase ML, and use it to perform inference with some client-side data.
Because we applied some preprocessing during model training, we will need to apply the same preprocessing to the model input when running on-device. A simple way to do this is to use a platform and language independent format such as a JSON file containing a map of every feature to metadata about how the preprocessing is done. You can find more detail on how this is done in the example app.
Next, we give the model a test input as follow:
val testInput = mapOf( "coins_spent" to 2048f, "distance_avg" to 1234f, "device_os" to "ANDROID", "game_day" to 10f, "geo_country" to "Canada", "last_run_end_reason" to "laser" )
The model suggests "sparky_armor" is the best IAP powerup for this particular user.
Measure model accuracy
To measure our model accuracy, we can simply keep track of the iap offers predicted by our model and whether they are clicked on using Firebase Analytics. You can use this together with Firebase A/B testing to measure the actual performance of the model. Taking it one step further, you can also perform A/B tests on different iterations of the model.
9. (Optional): Updating model regularly with new data
If you need to update your model as new data comes in, you can set up a pipeline to retrain your model on a recurring basis. To do this, you need to first make sure you have new data to use for training using the epsilon-greedy strategy we mentioned above. (e.g. Using the model prediction result 70% of the time and using random results 30% of the time).
In this codelab, you learned how to train and deploy an on-device TFLite model for optimizing in-app-purchases using Firebase. To learn more about TFLite and Firebase, take a look at other TFLite samples and the Firebase getting started guides.
If you have any questions, you can leave them at Stack Overflow #firebase-machine-learning.
What we've covered
- TensorFlow Lite
- Firebase ML
- Firebase Analytics
- Train and deploy an optimizer model for your app.