Gemini 3 Pro & Flash, Gemini 3 Pro Image (nano banana pro), and the latest Gemini Live API native audio models are now available to use with Firebase AI Logic on all platforms!

Gemini 2.0 Flash and Flash-Lite models will be retired on March 3, 2026. To avoid service disruption, update to a newer model like gemini-2.5-flash-lite. Learn more.

Live API 的設定選項

即使是 Live API 的基本實作，您也能為使用者建構引人入勝的強大互動體驗。您也可以使用下列設定選項，進一步自訂體驗：

回覆語音和語言
音訊輸入和輸出內容的轉錄稿
語音活動偵測 (VAD)
工作階段管理

回覆語音和語言

您可以讓模型以特定聲音回覆，以及影響模型以不同語言回覆。

指定回覆語音

按一下 Gemini API 供應商，即可在這個頁面查看供應商專屬內容和程式碼。

Live API 會使用 Chirp 3，以 HD 語音合成語音回覆。

如未指定回覆語音，預設為 Puck。

查看回覆語音選項清單

如要試聽各個語音，請參閱「Chirp 3：HD 語音」。

Zephyr -- 明亮
Kore -- 堅實
Orus -- 堅實
Autonoe -- 明亮
Umbriel -- 輕鬆
Erinome -- 清晰
Laomedeia -- 活潑
Schedar -- 平穩
Achird -- 友善
Sadachbia -- 活潑 Puck -- Upbeat
Fenrir -- Excitable
Aoede -- Breezy
Enceladus -- Breathy
Algieba -- Smooth
Algenib -- Gravelly
Achernar -- Soft
Gacrux -- Mature
Zubenelgenubi -- Casual
Sadaltager -- Knowledgeable Charon -- 資訊豐富
Leda -- 年輕
Callirrhoe -- 輕鬆
Iapetus -- 清晰
Despina -- 流暢
Rasalgethi -- 資訊豐富
Alnilam -- 堅定
Pulcherrima -- 積極
Vindemiatrix -- 溫和
Sulafat -- 溫暖

如要指定回覆語音，請在 speechConfig 物件中設定語音名稱，做為模型設定的一部分。

Swift


// ...

let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to use a specific voice for its audio response
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    speech: SpeechConfig(voiceName: "VOICE_NAME")
  )
)

// ...

Kotlin


// ...

val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        speechConfig = SpeechConfig(voice = Voice("VOICE_NAME"))
    }
)

// ...

Java


// ...

LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    new LiveGenerationConfig.Builder()
        .setResponseModality(ResponseModality.AUDIO)
        .setSpeechConfig(new SpeechConfig(new Voice("VOICE_NAME")))
        .build()
);

// ...

Web


// ...

const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

const liveModel = getLiveGenerativeModel(ai, {
  model: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to use a specific voice for its audio response
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: { voiceName: "VOICE_NAME" },
      },
    },
  },
});

// ...

Dart


// ...

final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to use a specific voice for its audio response
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
    speechConfig: SpeechConfig(voiceName: 'VOICE_NAME'),
  ),
);

// ...

Unity


// ...

var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio },
        speechConfig: SpeechConfig.UsePrebuiltVoice("VOICE_NAME")
    )
);

// ...

影響回覆語言

Live API 模型會自動選擇適當的語言來回覆。

查看支援的語言清單

語言	BCP-47 代碼	語言	BCP-47 代碼
阿拉伯文 (埃及)	ar-EG	德文 (德國)	de-DE
英文 (美國)	en-US	西班牙文 (美國)	es-US
法文 (法國)	fr-FR	北印度文 (印度)	hi-IN
印尼文 (印尼)	id-ID	義大利文 (義大利)	it-IT
日文 (日本)	ja-JP	韓文 (韓國)	ko-KR
葡萄牙文 (巴西)	pt-BR	俄文 (俄羅斯)	ru-RU
荷蘭文 (荷蘭)	nl-NL	波蘭文 (波蘭)	pl-PL
泰文 (泰國)	th-TH	土耳其文 (土耳其)	tr-TR
越南文 (越南)	vi-VN	羅馬尼亞文 (羅馬尼亞)	ro-RO
烏克蘭文 (烏克蘭)	uk-UA	孟加拉文 (孟加拉)	bn-BD
英文 (印度)	en-IN 和 hi-IN 套裝組合	馬拉地文 (印度)	mr-IN
泰米爾文 (印度)	ta-IN	泰盧固文 (印度)	te-IN

如要讓模型以非英文語言回覆，或一律以特定語言回覆，可以使用系統指示來影響模型的回覆，例如：

向模型強調非英文語言可能較為合適

Listen to the speaker carefully. If you detect a non-English language, respond
in the language you hear from the speaker. You must respond unmistakably in the
speaker's language.

要求模型一律以特定語言回覆

RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.

音訊輸入和輸出內容的轉錄稿

按一下 Gemini API 供應商，即可在這個頁面查看供應商專屬內容和程式碼。

模型的回覆會包含語音輸入內容和模型語音回覆的轉錄稿。您可以在模型設定中設定這項設定。

如要轉錄音訊輸入內容，請新增 inputAudioTranscription。
如要轉錄模型語音回覆，請加入 outputAudioTranscription。

注意事項：

您可以設定模型，傳回輸入和輸出內容的轉錄稿 (如下例所示)，也可以設定模型只傳回其中一種。
轉錄稿會與音訊一起串流傳輸，因此最好像收集每個回合的文字部分一樣收集轉錄稿。
轉錄語言是根據音訊輸入內容和模型的音訊回覆推斷而來。

Swift


// ...

let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to return transcriptions of the audio input and output
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    inputAudioTranscription: AudioTranscriptionConfig(),
    outputAudioTranscription: AudioTranscriptionConfig()
  )
)

var inputTranscript: String = ""
var outputTranscript: String = ""

do {
  let session = try await liveModel.connect()
  for try await response in session.responses {
    if case let .content(content) = response.payload {
      if let inputText = content.inputAudioTranscription?.text {
        // Handle transcription text of the audio input
        inputTranscript += inputText
      }

      if let outputText = content.outputAudioTranscription?.text {
        // Handle transcription text of the audio output
        outputTranscript += outputText
      }

      if content.isTurnComplete {
        // Log the transcripts after the current turn is complete
        print("Input audio: \(inputTranscript)")
        print("Output audio: \(outputTranscript)")

        // Reset the transcripts for the next turn
        inputTranscript = ""
        outputTranscript = ""
      }
    }
  }


} catch {
  // Handle error
}

// ...

Kotlin


// ...

val liveModel = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        inputAudioTranscription = AudioTranscriptionConfig()
        outputAudioTranscription = AudioTranscriptionConfig()
   }
)

val liveSession = liveModel.connect()

fun handleTranscription(input: Transcription?, output: Transcription?) {
    input?.text?.let { text ->
        // Handle transcription text of the audio input
        println("Input Transcription: $text")
    }
    output?.text?.let { text ->
        // Handle transcription text of the audio output
        println("Output Transcription: $text")
    }
}

liveSession.startAudioConversation(null, ::handleTranscription)

// ...

Java


// ...

ExecutorService executor = Executors.newFixedThreadPool(1);

LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    new LiveGenerationConfig.Builder()
            .setResponseModality(ResponseModality.AUDIO)
            .setInputAudioTranscription(new AudioTranscriptionConfig())
            .setOutputAudioTranscription(new AudioTranscriptionConfig())
            .build()
    );

LiveModelFutures liveModel = LiveModelFutures.from(lm);
ListenableFuture sessionFuture = liveModel.connect();

Futures.addCallback(sessionFuture, new FutureCallback() {
    @Override
    public void onSuccess(LiveSessionFutures ses) {
        LiveSessionFutures session = ses;
        session.startAudioConversation((Transcription input, Transcription output) -> {
            if (input != null) {
                // Handle transcription text of the audio input
                System.out.println("Input Transcription: " + input.getText());
            }
            if (output != null) {
                // Handle transcription text of the audio output
                System.out.println("Output Transcription: " + output.getText());
            }
            return null;
        });
    }

    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
        t.printStackTrace();
    }
}, executor);

// ...

Web


// ...

const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

const liveModel = getLiveGenerativeModel(ai, {
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to return transcriptions of the audio input and output
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    inputAudioTranscription: {},
    outputAudioTranscription: {},
  },
});

const liveSession = await liveModel.connect();

liveSession.sendAudioRealtime({ data, mimeType: "audio/pcm" });

const messages = liveSession.receive();
for await (const message of messages) {
  switch (message.type) {
    case 'serverContent':
      if (message.inputTranscription) {
        // Handle transcription text of the audio input
        console.log(`Input transcription: ${message.inputTranscription.text}`);
      }
      if (message.outputTranscription) {
        // Handle transcription text of the audio output
        console.log(`Output transcription: ${message.outputTranscription.text}`);
      } else {
      	 // Handle other message types (modelTurn, turnComplete, interruption)
      }
    default:
      // Handle other message types (toolCall, toolCallCancellation)
  }
}

// ...

Dart


// ...

final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to return transcriptions of the audio input and output
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
    inputAudioTranscription: AudioTranscriptionConfig(),
    outputAudioTranscription: AudioTranscriptionConfig(),
  ),
);

final LiveSession _session = _liveModel.connect();

await for (final response in _session.receive()) {
  LiveServerContent message = response.message;
  if (message.inputTranscription?.text case final inputText?) {
    // Handle transcription text of the audio input
    print('Input: $inputText');
  }

  if (message.outputTranscription?.text case final outputText?) {
    // Handle transcription text of the audio output
    print('Output: $outputText');
  }
}

// ...

Unity


// ...

var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio },
        inputAudioTranscription: new AudioTranscriptionConfig(),
        outputAudioTranscription: new AudioTranscriptionConfig()
    )
);

try
{
    var session = await liveModel.ConnectAsync();
    var stream = session.ReceiveAsync();
    await foreach (var response in stream) {
        if (response.Message is LiveSessionContent sessionContent) {
            if (!string.IsNullOrEmpty(sessionContent.InputTranscription?.Text)) {
              // handle transcription text of input audio
            }

            if (!string.IsNullOrEmpty(sessionContent.OutputTranscription?.Text)) {
              // handle transcription text of output audio
            }
        }
    }
}
catch (Exception e)
{
    // Handle error
}

// ...

語音活動偵測 (VAD)

模型會對連續音訊輸入串流自動執行語音活動偵測 (VAD)。VAD 預設為啟用。

工作階段管理

瞭解下列與工作階段相關的主題：
- 進階功能，包括：
  - 在工作階段中更新系統指令
  - 新增增量內容更新
- 工作階段相關限制，包括連線和工作階段長度限制、工作階段內容視窗限制，以及速率限制。
Firebase AI Logic尚未支援下列工作階段管理功能。請過一陣子再回來查看！
- 處理中斷
- 延長工作階段時間長度
- 繼續執行工作階段
- 在工作階段和要求之間維持脈絡
- 壓縮脈絡窗口

Live API 的設定選項 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

回覆語音和語言

指定回覆語音

Swift

Kotlin

Java

Web

Dart

Unity

影響回覆語言

音訊輸入和輸出內容的轉錄稿

Swift

Kotlin

Java

Web

Dart

Unity

語音活動偵測 (VAD)

工作階段管理

Live API 的設定選項