Build hybrid experiences in Apple apps with on-device and cloud-hosted models

You can build AI-powered Apple apps and features with hybrid inference using Firebase AI Logic. Hybrid inference enables running inference using on-device models (specifically Apple's Foundation Models framework) when available and seamlessly falling back to cloud-hosted Google models otherwise (and vice versa).

This page describes how to get started using the client SDK, as well as showing additional configuration options and capabilities, like temperature.

Note that on-device inference via Firebase AI Logic is supported for Apple apps using Firebase AI Logic SDK v12.13.0+ and running on devices with Apple Intelligence enabled. It's governed by the Acceptable use requirements for Apple's Foundation Models framework.

Recommended use cases

  • Using an on-device model for inference offers:

    • Enhanced privacy
    • Inference at no-cost
    • Offline functionality
  • Using hybrid functionality offers:

    • Provide all customers with a similar app experience regardless of the end-user's device
    • Improve availability of generative AI features, regardless of internet connectivity, quota limitations, or device capabilities

Supported capabilities, APIs, and devices

Before you implement hybrid and on-device inference using Firebase AI Logic, review this section to understand what's supported for Apple apps.

Supported capabilities and features for on-device inference

On-device inference only supports text generation, specifically the following text-generation capabilities:

Make sure to review the detailed list for not-yet-supported hybrid or on-device inference at the bottom of this page.

Supported APIs and devices

Get started

Make sure that you've reviewed the section above describing supported capabilities, APIs, and devices.

These get started steps describe the required general setup for any supported prompt request that you want to send.

Step 1: Set up a Firebase project and connect your app to Firebase

  1. Sign into the Firebase console, and then select your Firebase project.

  2. In the Firebase console, go to AI Services > AI Logic.

  3. Click Get started to launch a guided workflow that helps you set up the required APIs and resources for your project.

  4. Set up your project to use a "Gemini API" provider.

    We recommend getting started using the Gemini Developer API. At any point, you can always set up the Vertex AI Gemini API (and its requirement for billing).

    For the Gemini Developer API, the console will enable the required APIs and create a Gemini API key in your project.
    Do not add this Gemini API key into your app's codebase. Learn more.

  5. If prompted in the console's workflow, follow the on-screen instructions to register your app and connect it to Firebase.

  6. Continue to the next step in this guide to add the SDK to your app.

Step 2: Add the required SDKs

Use Swift Package Manager (SPM) to install and manage Xcode dependencies. Hybrid support is only available when using SPM.

The Firebase AI Logic library provides access to the APIs for interacting with generative models. The library is included as part of the Firebase SDK for Apple platforms (firebase-ios-sdk).

If you're already using Firebase, then make sure your Firebase package is v12.13.0 or later.

  1. In Xcode, with your app project open, navigate to File > Add Package Dependencies.

  2. When prompted, add the Firebase Apple platforms SDK repository:

    https://github.com/firebase/firebase-ios-sdk
    
  3. Select the latest SDK version.

  4. Select the FirebaseAILogic library.

When finished, Xcode will automatically begin resolving and downloading your dependencies in the background.

Step 3: Initialize the service and create a model session instance

Click your Gemini API provider to view provider-specific content and code on this page.

Set up the following before you send a prompt request to the model.

  1. Initialize the service for your chosen Gemini API provider.

  2. Create a GenerativeModelSession instance with a HybridModel.

  3. Set the primary and secondary models based on your preferences. You can set the order of attempted inference:

    • Attempt on-device inference first, but allow fallback to cloud: set primary to a "system" model and secondary to a cloud model.

    • Attempt in-cloud inference first, but allow fallback to on-device: set primary to a cloud model and secondary to a "system" model.

    Note that the SDK supports setting only a single model which means the SDK will only attempt either on-device or in-cloud inference. However, for a hybrid experience, you need to create a HybridModel and set both primary and secondary models.

    Learn more about the behaviour of "inference modes" (the order of attempted inference) in Configuration options.

The following example shows how to attempt on-device inference first, but allow fall back to the cloud-hosted model:

// Initialize the Gemini Developer API backend service
let ai = FirebaseAI.firebaseAI(backend: .googleAI())

// Initialize a cloud model that supports your use case
let cloudModel = ai.geminiModel(name: "GEMINI_MODEL_NAME")
// Initialize an on-device model that supports your use case
let systemModel = FirebaseAI.SystemLanguageModel.default

// Create a Hybrid Model
// Provide your preferred model as `primary` and your fallback model as `secondary`
// In this example, attempt to use on-device model; otherwise, fall back to cloud.
let hybridModel = HybridModel(
  primary: systemModel,
  secondary: cloudModel
)

// Create a GenerativeModelSession with the HybridModel created earlier.
let session = firebaseAI.generativeModelSession(
  model: hybridModel,
)

Step 4: Send a prompt request to a model

This section shows you how to do the following:

Generate text from text-only input

Before trying this sample, make sure that you've completed the Get started section of this guide.

To generate text from a prompt that contains text, use respond(to:) like so:

// Imports + initialization of Gemini API backend service + creation of model session

// Provide a prompt that contains text
let prompt = "Write a story about a magic backpack."

// To generate text output, call `respond(to:)` with the text input
let response = try await session.respond(to: prompt)
print(response.content)

Stream text from text-only input

Before trying this sample, make sure that you've completed the Get started section of this guide.

You can achieve faster interactions by not waiting for the entire result from the model generation, and instead use streaming to handle partial results. To stream generated text from a prompt that contains text, use streamResponse(to:) like so:

// Imports + initialization of Gemini API backend service + creation of model session

// Provide a prompt that contains text
let prompt = "Write a story about a magic backpack."

// To stream generated text output, call `streamResponse(to:)` with the text input
let stream = session.streamResponse(to: prompt)
for try await snapshot in stream {
  print(snapshot.content)
}

What else can you do?

You can use various additional configuration options and capabilities for your hybrid experiences:

Features not-yet-supported for hybrid or on-device inference

As an experimental release, not all the capabilities of Firebase AI Logic or cloud-hosted models are supported.

  • The following are not supported for hybrid or on-device implementations: Imagen models, the Gemini Live API, and prompt templates. Also, count tokens shouldn't be relied upon because the count will differ between cloud-hosted and on-device models, so there's no intuitive fall back.

  • The following features are not yet supported for on-device inference. If you want to use any of these features, then we recommend using only a cloud-hosted model for a more consistent experience.

    • Generating text from multimodal inputs, like images, audio, video, and documents (PDFs)

    • Generating media, like images, audio, or video

    • Sending requests that exceed 4096 tokens (or approximately 3000 English words).

    • Providing the on-device model with built-in tools to help it generate its response (like code execution, URL context, and Grounding with Google Search)

  • AI monitoring in the Firebase console does not show any data for on-device inference (including on-device logs). However, any inference that uses a cloud-hosted model can be monitored just like other inference via Firebase AI Logic.

Additional limitations

In addition to the above, on-device inference has the following limitations:


Give feedback about your experience with Firebase AI Logic