The latest Gemini models, like Gemini 3.1 Flash Image (Nano Banana 2), are available to use with Firebase AI Logic! Learn more.

Gemini 2.0 Flash and Flash-Lite models will shut down on June 1, 2026. To avoid service disruption, update to a newer model like gemini-3.1-flash-lite. Learn more.

All Imagen models will shut down on June 24, 2026. Learn about migrating your apps to use Nano Banana.

Configuration options for hybrid experiences in Apple apps

This page describes the following configuration options for hybrid and on-device experiences:

Set an "inference mode".
Check if the on-device model is available.
Determine whether on-device or in-cloud inference was used.
Use model configuration to control responses (like temperature).

Make sure that you've completed the getting started guide for building hybrid experiences.

Set an "inference mode"

The examples in the getting started guide show how to implement attempting on-device inference first, and then falling back to the cloud-hosted model. This is only one of the available "inference modes" that you can implement.

Hybrid inference

Prefer on-device inference: set primary to a "system" model and secondary to a cloud model.

Attempt to use the on-device model if it's available and supports the type of request. Otherwise, log an error on the device and then automatically fall back to the cloud-hosted model.

// Imports + initialization of Gemini API backend service
// ...

// Initialize a cloud model that supports your use case
let cloudModel = ai.geminiModel(name: "GEMINI_MODEL_NAME")
// Initialize an on-device model that supports your use case
let systemModel = FirebaseAI.SystemLanguageModel.default

// Create a GenerativeModelSession with a hybrid model.
// Provide your preferred model as `primary` and your fallback model as `secondary`
// Attempt to use the on-device model; otherwise, fall back to the cloud-hosted model.
let session = ai.generativeModelSession(
  model: .hybridModel(primary: systemModel, secondary: cloudModel)
)

Prefer in-cloud inference: set primary to a cloud model and secondary to a "system" model.

Attempt to use the cloud-hosted model if the device is online and if the model is available. If the device is offline, fall back to the on-device model. In all other failure cases, throw an exception.

// Imports + initialization of Gemini API backend service
// ...

// Initialize a cloud model that supports your use case
let cloudModel = ai.geminiModel(name: "GEMINI_MODEL_NAME")
// Initialize an on-device model that supports your use case
let systemModel = FirebaseAI.SystemLanguageModel.default

// Create a GenerativeModelSession with a hybrid model.
// Provide your preferred model as `primary` and your fallback model as `secondary`
// Attempt to use the cloud-hosted model; otherwise, fall back to the on-device model.
let session = ai.generativeModelSession(
  model: .hybridModel(primary: cloudModel, secondary: systemModel)
)

Only on-device or only in-cloud inference

The SDK supports setting only a single model which means the SDK will only attempt either on-device or in-cloud inference. Also, you don't create a HybridModel for this use case. However, for a hybrid experience, you do need to create a HybridModel and set both primary and secondary models (as described above).

Only on-device inference: set model to a "system" model. You don't create a HybridModel for this use case.

Attempt to use the on-device model if it's available and supports the type of request. Otherwise, throw an exception.

// Imports + initialization of Gemini API backend service
// ...

// Initialize an on-device model that supports your use case
let systemModel = FirebaseAI.SystemLanguageModel.default

// Create a GenerativeModelSession with the on-device model.
let session = ai.generativeModelSession(
  model: systemModel
)

Only in-cloud inference: set model to a cloud model. You don't create a HybridModel for this use case.

Attempt to use the cloud-hosted model if the device is online and if the model is available. Otherwise, throw an exception.

// Imports + initialization of Gemini API backend service
// ...

// Initialize a cloud model that supports your use case
let cloudModel = ai.geminiModel(name: "GEMINI_MODEL_NAME")

// Create a GenerativeModelSession with a cloud model.
let session = ai.generativeModelSession(
  model: cloudModel
)

Check if the on-device model is available

Manual checks for on-device availability are only necessary if you want to surface that information to the user or request that end-users take action to download the on-device model. If the on-device model is unavailable – and you've set primary to an on-device model and secondary to a cloud model – then the SDK will automatically fallback to using the cloud-hosted model.

To manually check whether the on-device model is actually usable, inspect the isAvailable property:

if FirebaseAI.SystemLanguageModel.default.isAvailable {
  // The on-device model is ready to use.
} else {
  // The on-device model is unavailable.
}

To check for specific on-device model availability reasons, inspect the availability property:

switch FirebaseAI.SystemLanguageModel.default.availability {
case .available:
  // The on-device model is ready to use.
  break
case .unavailable(.deviceNotEligible):
  // This device does not support Apple Intelligence.
  break
case .unavailable(.appleIntelligenceNotEnabled):
  // The user has not enabled Apple Intelligence in Settings.
  break
case .unavailable(.modelNotReady):
  // The model is still being downloaded.
  break
case let .unavailable(reason):
  // The model is unavailable due to the specified `reason`.
  break
}

Determine whether on-device or in-cloud inference was used

If you use a HybridModel (and set both primary and secondary models), then it might be helpful to know which model was used for a given request. This information is provided by the modelVersion property of rawResponse in each response.

When you access this property, the returned value will be one of the following:

Cloud-hosted model used: the model name, for example gemini-3.1-flash-lite
On-device model used: apple-foundation-models-system-language-model

// let response = try await session.respond(to: ...

print("You used: \(response.rawResponse.modelVersion)")

print(response.content)

Use model configuration to control responses

In each request to a model, you can send along a model configuration to control how the model generates a response. Cloud-hosted models and on-device models offer different configuration options (cloud vs on-device parameters).

Cloud-hosted models: set their configuration in a GenerationConfig.
On-device models: set their configuration within FirebaseAI.GenerationOptions.

These options are configured for each request to the model.

Here's an example that sets the configurations for the cloud-hosted and on-device models for hybrid inference:

// ...

let response = try await session.respond(
  to: "Why is the sky blue?",
  options: .hybrid(
    // Config for cloud-hosted model
    gemini: GenerationConfig(
      temperature: 0.8,
      topP: 0.9,
      thinkingConfig: ThinkingConfig(thinkingLevel: .high)
    ),
    // Config for on-device model
    foundationModels: FirebaseAI.GenerationOptions(
      sampling: .random(probabilityThreshold: 0.9),
      temperature: 0.8
    )
  )
)

// ...

Give feedback about your experience with Firebase AI Logic