Use a TensorFlow Lite model for inference with ML Kit on iOS

You can use ML Kit to perform on-device inference with a TensorFlow Lite model.

ML Kit can use TensorFlow Lite models only on devices running iOS 9 and newer.

See the ML Kit quickstart sample on GitHub for an example of this API in use.

Before you begin

  1. If you have not already added Firebase to your app, do so by following the steps in the getting started guide.
  2. Include the ML Kit libraries in your Podfile:
    pod 'Firebase/Core'
    pod 'Firebase/MLModelInterpreter'
    After you install or update your project's Pods, be sure to open your Xcode project using its .xcworkspace.
  3. In your app, import Firebase:


    import Firebase


    @import Firebase;
  4. Also, import the FirebaseMLCommon module:


    import FirebaseMLCommon


    @import FirebaseMLCommon;
  5. Convert the TensorFlow model you want to use to TensorFlow Lite format. See TOCO: TensorFlow Lite Optimizing Converter.

Host or bundle your model

Before you can use a TensorFlow Lite model for inference in your app, you must make the model available to ML Kit. ML Kit can use TensorFlow Lite models hosted remotely using Firebase, bundled with the app binary, or both.

By hosting a model on Firebase, you can update the model without releasing a new app version, and you can use Remote Config and A/B Testing to dynamically serve different models to different sets of users.

If you choose to only provide the model by hosting it with Firebase, and not bundle it with your app, you can reduce the initial download size of your app. Keep in mind, though, that if the model is not bundled with your app, any model-related functionality will not be available until your app downloads the model for the first time.

By bundling your model with your app, you can ensure your app's ML features still work when the Firebase-hosted model isn't available.

Host models on Firebase

To host your TensorFlow Lite model on Firebase:

  1. In the ML Kit section of the Firebase console, click the Custom tab.
  2. Click Add custom model (or Add another model).
  3. Specify a name that will be used to identify your model in your Firebase project, then upload the TensorFlow Lite model file (usually ending in .tflite or .lite).

After you add a custom model to your Firebase project, you can reference the model in your apps using the name you specified. At any time, you can upload a new TensorFlow Lite model, and your app will download the new model and start using it when the app next restarts. You can define the device conditions required for your app to attempt to update the model (see below).

Bundle models with an app

To bundle your TensorFlow Lite model with your app, add the model file (usually ending in .tflite or .lite) to your Xcode project, taking care to select Copy bundle resources when you do so. The model file will be included in the app bundle and available to ML Kit.

Load the model

To use your TensorFlow Lite model in your app, first configure ML Kit with the locations where your model is available: remotely using Firebase, in local storage, or both. If you specify both a local and remote model, ML Kit will use the remote model if it is available, and fall back to the locally-stored model if the remote model isn't available.

Configure a Firebase-hosted model

If you hosted your model with Firebase, register a RemoteModel object, specifying the name you assigned the model when you uploaded it, and the conditions under which ML Kit should download the model initially and when updates are available.


let initialConditions = ModelDownloadConditions(
  allowsCellularAccess: true,
  allowsBackgroundDownloading: true
let updateConditions = ModelDownloadConditions(
  allowsCellularAccess: false,
  allowsBackgroundDownloading: true
let remoteModel = RemoteModel(
  name: "my_remote_model",
  allowsModelUpdates: true,
  initialConditions: initialConditions,
  updateConditions: updateConditions
let isRegistered = ModelManager.modelManager().register(remoteModel)


FIRModelDownloadConditions *initialConditions =
    [[FIRModelDownloadConditions alloc] initWithAllowsCellularAccess:YES
FIRModelDownloadConditions *updateConditions =
    [[FIRModelDownloadConditions alloc] initWithAllowsCellularAccess:NO
FIRRemoteModel *remoteModel = [[FIRRemoteModel alloc] initWithName:@"my_remote_model"
  BOOL isRegistered =
      [[FIRModelManager modelManager] registerRemoteModel:remoteModel];

Configure a local model

If you bundled the model with your app, register a LocalModel object, specifying the filename of the TensorFlow Lite model and assigning the model a name you will use in the next step.


guard let modelPath = Bundle.main.path(forResource: "my_model", ofType: "tflite")
    else {
        // Invalid model path
let localModel = LocalModel(name: "my_local_model", path: modelPath)
let registrationSuccessful = ModelManager.modelManager().register(localModel)


NSString *modelPath = [NSBundle.mainBundle pathForResource:@"my_model"
FIRLocalModel *localModel = [[FIRLocalModel alloc] initWithName:@"my_local_model"
BOOL registrationSuccess =
    [[FIRModelManager modelManager] registerLocalModel:localModel];

Create an interpreter from your model

After you configure your model locations, create a ModelOptions object with the remote model, the local model, or both, and use it to get an instance of ModelInterpreter. If you only have one model, specify nil for the type you don't use.


let options = ModelOptions(remoteModelName: "my_remote_model",
                           localModelName: "my_local_model")
let interpreter = ModelInterpreter.modelInterpreter(options: options)


FIRModelOptions *options = [[FIRModelOptions alloc] initWithRemoteModelName:@"my_remote_model"
FIRModelInterpreter *interpreter = [FIRModelInterpreter modelInterpreterWithOptions:options];

Specify the model's input and output

Next, configure the model interpreter's input and output formats.

A TensorFlow Lite model takes as input and produces as output one or more multidimensional arrays. These arrays contain either byte, int, long, or float values. You must configure ML Kit with the number and dimensions ("shape") of the arrays your model uses.

If you don't know the shape and data type of your model's input and output, you can use the TensorFlow Lite Python interpreter to inspect your model. For example:

import tensorflow as tf

interpreter = tf.lite.Interpreter(model_path="my_model.tflite")

# Print input shape and type
print(interpreter.get_input_details()[0]['shape'])  # Example: [1 224 224 3]
print(interpreter.get_input_details()[0]['dtype'])  # Example: <class 'numpy.float32'>

# Print output shape and type
print(interpreter.get_output_details()[0]['shape'])  # Example: [1 1000]
print(interpreter.get_output_details()[0]['dtype'])  # Example: <class 'numpy.float32'>

After you determine the format of your model's input and output, configure your app' model interpreter by creating a ModelInputOutputOptions object.

For example, a floating-point image classification model might take as input an Nx224x224x3 array of Float values, representing a batch of N 224x224 three-channel (RGB) images, and produce as output a list of 1000 Float values, each representing the probability the image is a member of one of the 1000 categories the model predicts.

For such a model, you would configure the model interpreter's input and output as shown below:


let ioOptions = ModelInputOutputOptions()
do {
    try ioOptions.setInputFormat(index: 0, type: .float32, dimensions: [1, 224, 224, 3])
    try ioOptions.setOutputFormat(index: 0, type: .float32, dimensions: [1, 1000])
} catch let error as NSError {
    print("Failed to set input or output format with error: \(error.localizedDescription)")


FIRModelInputOutputOptions *ioOptions = [[FIRModelInputOutputOptions alloc] init];
NSError *error;
[ioOptions setInputFormatForIndex:0
                       dimensions:@[@1, @224, @224, @3]
if (error != nil) { return; }
[ioOptions setOutputFormatForIndex:0
                        dimensions:@[@1, @1000]
if (error != nil) { return; }

Perform inference on input data

Finally, to perform inference using the model, get your input data, perform any transformations on the data that might be necessary for your model, and build a Data object that contains the data.

For example, if your model processes images, and your model has input dimensions of [BATCH_SIZE, 224, 224, 3] floating-point values, you might have to scale the image's color values to a floating-point range as in the following example:


let image: CGImage = // Your input image
guard let context = CGContext(
  data: nil,
  width: image.width, height: image.height,
  bitsPerComponent: 8, bytesPerRow: image.width * 4,
  space: CGColorSpaceCreateDeviceRGB(),
  bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue
) else {
  return false

context.draw(image, in: CGRect(x: 0, y: 0, width: image.width, height: image.height))
guard let imageData = else { return false }

let inputs = ModelInputs()
var inputData = Data()
do {
  for row in 0 ..< 224 {
    for col in 0 ..< 224 {
      let offset = 4 * (col * context.width + row)
      // (Ignore offset 0, the unused alpha channel)
      let red = imageData.load(fromByteOffset: offset+1, as: UInt8.self)
      let green = imageData.load(fromByteOffset: offset+2, as: UInt8.self)
      let blue = imageData.load(fromByteOffset: offset+3, as: UInt8.self)

      // Normalize channel values to [0.0, 1.0]. This requirement varies
      // by model. For example, some models might require values to be
      // normalized to the range [-1.0, 1.0] instead, and others might
      // require fixed-point values or the original bytes.
      var normalizedRed = Float32(red) / 255.0
      var normalizedGreen = Float32(green) / 255.0
      var normalizedBlue = Float32(blue) / 255.0

      // Append normalized values to Data object in RGB order.
      let elementSize = MemoryLayout.size(ofValue: normalizedRed)
      var bytes = [UInt8](repeating: 0, count: elementSize)
      memcpy(&bytes, &normalizedRed, elementSize)
      inputData.append(&bytes, count: elementSize)
      memcpy(&bytes, &normalizedGreen, elementSize)
      inputData.append(&bytes, count: elementSize)
      memcpy(&ammp;bytes, &normalizedBlue, elementSize)
      inputData.append(&bytes, count: elementSize)
  try inputs.addInput(inputData)
} catch let error {
  print("Failed to add input: \(error)")


CGImageRef image = // Your input image
long imageWidth = CGImageGetWidth(image);
long imageHeight = CGImageGetHeight(image);
CGContextRef context = CGBitmapContextCreate(nil,
                                             imageWidth, imageHeight,
                                             imageWidth * 4,
CGContextDrawImage(context, CGRectMake(0, 0, imageWidth, imageHeight), image);
UInt8 *imageData = CGBitmapContextGetData(context);

FIRModelInputs *inputs = [[FIRModelInputs alloc] init];
NSMutableData *inputData = [[NSMutableData alloc] initWithCapacity:0];

for (int row = 0; row < 224; row++) {
  for (int col = 0; col < 224; col++) {
    long offset = 4 * (col * imageWidth + row);
    // Normalize channel values to [0.0, 1.0]. This requirement varies
    // by model. For example, some models might require values to be
    // normalized to the range [-1.0, 1.0] instead, and others might
    // require fixed-point values or the original bytes.
    // (Ignore offset 0, the unused alpha channel)
    Float32 red = imageData[offset+1] / 255.0f;
    Float32 green = imageData[offset+2] / 255.0f;
    Float32 blue = imageData[offset+3] / 255.0f;

    [inputData appendBytes:&red length:sizeof(red)];
    [inputData appendBytes:&green length:sizeof(green)];
    [inputData appendBytes:&blue length:sizeof(blue)];

[inputs addInput:inputData error:&error];
if (error != nil) { return nil; }

After you prepare your model input, pass the input and input/output options to your model interpreter's run(inputs:options:) method.

Swift inputs, options: ioOptions) { outputs, error in
    guard error == nil, let outputs = outputs else { return }
    // Process outputs
    // ...


[interpreter runWithInputs:inputs
                completion:^(FIRModelOutputs * _Nullable outputs,
                             NSError * _Nullable error) {
  if (error != nil || outputs == nil) {
  // Process outputs
  // ...

You can get the output by calling the output(index:) method of the object that is returned. For example:


// Get first and only output of inference with a batch size of 1
let output = try? outputs.output(index: 0) as? [[NSNumber]]
let probabilities = output??[0]


// Get first and only output of inference with a batch size of 1
NSError *outputError;
NSArray *probabilites = [outputs outputAtIndex:0 error:&outputError][0];

How you use the output depends on the model you are using.

For example, if you are performing classification, as a next step, you might map the indexes of the result to the labels they represent. Suppose you had a text file with label strings for each of your model's categories; you could map the label strings to the output probabilities by doing something like the following:


guard let labelPath = Bundle.main.path(forResource: "retrained_labels", ofType: "txt") else { return }
let fileContents = try? String(contentsOfFile: labelPath)
guard let labels = fileContents?.components(separatedBy: "\n") else { return }

for i in 0 ..< labels.count {
  if let probability = probabilities?[i] {
    print("\(labels[i]): \(probability)")


NSError *labelReadError = nil;
NSString *labelPath = [NSBundle.mainBundle pathForResource:@"retrained_labels"
NSString *fileContents = [NSString stringWithContentsOfFile:labelPath
if (labelReadError != nil || fileContents == NULL) { return; }
NSArray<NSString *> *labels = [fileContents componentsSeparatedByString:@"\n"];
for (int i = 0; i < labels.count; i++) {
    NSString *label = labels[i];
    NSNumber *probability = probabilites[i];
    NSLog(@"%@: %f", label, probability.floatValue);

Appendix: Model security

Regardless of how you make your TensorFlow Lite models available to ML Kit, ML Kit stores them in the standard serialized protobuf format in local storage.

In theory, this means that anybody can copy your model. However, in practice, most models are so application-specific and obfuscated by optimizations that the risk is similar to that of competitors disassembling and reusing your code. Nevertheless, you should be aware of this risk before you use a custom model in your app.