使用 Firebase AI Logic 開始使用 Gemini Live API


Gemini Live API 可與 Gemini 模型進行雙向互動,以低延遲的即時語音和視訊方式對話。

Live API 和其特殊模型系列可處理連續的音訊、影片或文字串流,立即提供擬真的口語回應,為使用者打造自然的對話體驗。

本頁說明如何開始使用最常見的功能 (串流音訊輸入和輸出),但 Live API 支援許多不同的功能設定選項

Live API 是有狀態的 API,可建立 WebSocket 連線,在用戶端和 Gemini 伺服器之間建立工作階段。詳情請參閱 Live API 參考文件 (Gemini Developer API | Vertex AI Gemini API)。

跳至程式碼範例

查看實用資源

事前準備

如果尚未完成,請參閱入門指南,瞭解如何設定 Firebase 專案、將應用程式連結至 Firebase、新增 SDK、為所選Gemini API供應商初始化後端服務,以及建立 LiveModel 執行個體。

您可以在 Google AI StudioVertex AI Studio 中,使用提示和 Live API 設計原型。

支援這項功能的模型

Gemini 2.5 Flash Live 模型是支援 Gemini Live API原生音訊模型。雖然模型名稱會因 Gemini API 供應商而異,但模型行為和功能相同。

  • Gemini Developer API

    • gemini-2.5-flash-native-audio-preview-12-2025
    • gemini-2.5-flash-native-audio-preview-09-2025

    雖然這些是預先發布模型,但仍可在 Gemini Developer API 的「免費層級」中使用。

  • Vertex AI Gemini API

    • gemini-live-2.5-flash-native-audio (2025 年 12 月發布)
    • gemini-live-2.5-flash-preview-native-audio-09-2025

    使用 Vertex AI Gemini API 時,Live API 模型支援 global 位置。

串流音訊輸入和輸出

按一下 Gemini API 供應商,即可在這個頁面查看供應商專屬內容和程式碼。

以下範例顯示如何基本實作,以傳送串流音訊輸入,並接收串流音訊輸出

如要瞭解 Live API 的其他選項和功能,請參閱本頁稍後的「你還能做什麼?」一節。

Swift

如要使用 Live API,請建立 LiveModel 例項,並將回應模式設為 audio


import FirebaseAILogic

// Initialize the Gemini Developer API backend service
// Create a `liveModel` instance with a model that supports the Live API
let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to respond with audio
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio]
  )
)

do {
  let session = try await liveModel.connect()

  // Load the audio file, or tap a microphone
  guard let audioFile = NSDataAsset(name: "audio.pcm") else {
    fatalError("Failed to load audio file")
  }

  // Provide the audio data
  await session.sendAudioRealtime(audioFile.data)

  var outputText = ""
  for try await message in session.responses {
    if case let .content(content) = message.payload {
      content.modelTurn?.parts.forEach { part in
        if let part = part as? InlineDataPart, part.mimeType.starts(with: "audio/pcm") {
          // Handle 16bit pcm audio data at 24khz
          playAudio(part.data)
        }
      }
      // Optional: if you don't require to send more requests.
      if content.isTurnComplete {
        await session.close()
      }
    }
  }
} catch {
  fatalError(error.localizedDescription)
}

Kotlin

如要使用 Live API,請建立 LiveModel 例項,並將回應模式設為 AUDIO


// Initialize the Gemini Developer API backend service
// Create a `liveModel` instance with a model that supports the Live API
val liveModel = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to respond with audio
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
   }
)

val session = liveModel.connect()

// This is the recommended approach.
// However, you can create your own recorder and handle the stream.
session.startAudioConversation()

Java

如要使用 Live API,請建立 LiveModel 例項,並將回應模式設為 AUDIO


ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `liveModel` instance with a model that supports the Live API
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.5-flash-native-audio-preview-12-2025",
        // Configure the model to respond with audio
        new LiveGenerationConfig.Builder()
                .setResponseModality(ResponseModality.AUDIO)
                .build()
);
LiveModelFutures liveModel = LiveModelFutures.from(lm);

ListenableFuture<LiveSession> sessionFuture =  liveModel.connect();

Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	 LiveSessionFutures session = LiveSessionFutures.from(ses);
        session.startAudioConversation();
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web

如要使用 Live API,請建立 LiveGenerativeModel 例項,並將回應模式設為 AUDIO


import { initializeApp } from "firebase/app";
import { getAI, getLiveGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveGenerativeModel` instance with a model that supports the Live API
const liveModel = getLiveGenerativeModel(ai, {
  model: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to respond with audio
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
  },
});

const session = await liveModel.connect();

// Start the audio conversation
const audioConversationController = await startAudioConversation(session);

// ... Later, to stop the audio conversation
// await audioConversationController.stop()

Dart

如要使用 Live API,請建立 LiveGenerativeModel 例項,並將回應模式設為 audio


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'package:your_audio_recorder_package/your_audio_recorder_package.dart';

late LiveModelSession _session;
final _audioRecorder = YourAudioRecorder();

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `liveGenerativeModel` instance with a model that supports the Live API
final liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to respond with audio
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
  ),
);

_session = await liveModel.connect();

final audioRecordStream = _audioRecorder.startRecordingStream();
// Map the Uint8List stream to InlineDataPart stream
final mediaChunkStream = audioRecordStream.map((data) {
  return InlineDataPart('audio/pcm', data);
});
await _session.startMediaStream(mediaChunkStream);

// In a separate thread, receive the audio response from the model
await for (final message in _session.receive()) {
   // Process the received message
}

Unity

如要使用 Live API,請建立 LiveModel 例項,並將回應模式設為 Audio


using Firebase;
using Firebase.AI;

async Task SendTextReceiveAudio() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with a model that supports the Live API
  var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
      modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
      // Configure the model to respond with audio
      liveGenerationConfig: new LiveGenerationConfig(
          responseModalities: new[] { ResponseModality.Audio })
    );

  LiveSession session = await liveModel.ConnectAsync();

  // Start a coroutine to send audio from the Microphone
  var recordingCoroutine = StartCoroutine(SendAudio(session));

  // Start receiving the response
  await ReceiveAudio(session);
}

IEnumerator SendAudio(LiveSession liveSession) {
  string microphoneDeviceName = null;
  int recordingFrequency = 16000;
  int recordingBufferSeconds = 2;

  var recordingClip = Microphone.Start(microphoneDeviceName, true,
                                       recordingBufferSeconds, recordingFrequency);

  int lastSamplePosition = 0;
  while (true) {
    if (!Microphone.IsRecording(microphoneDeviceName)) {
      yield break;
    }

    int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName);

    if (currentSamplePosition != lastSamplePosition) {
      // The Microphone uses a circular buffer, so we need to check if the
      // current position wrapped around to the beginning, and handle it
      // accordingly.
      int sampleCount;
      if (currentSamplePosition > lastSamplePosition) {
        sampleCount = currentSamplePosition - lastSamplePosition;
      } else {
        sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition;
      }

      if (sampleCount > 0) {
        // Get the audio chunk
        float[] samples = new float[sampleCount];
        recordingClip.GetData(samples, lastSamplePosition);

        // Send the data, discarding the resulting Task to avoid the warning
        _ = liveSession.SendAudioAsync(samples);

        lastSamplePosition = currentSamplePosition;
      }
    }

    // Wait for a short delay before reading the next sample from the Microphone
    const float MicrophoneReadDelay = 0.5f;
    yield return new WaitForSeconds(MicrophoneReadDelay);
  }
}

Queue audioBuffer = new();

async Task ReceiveAudio(LiveSession liveSession) {
  int sampleRate = 24000;
  int channelCount = 1;

  // Create a looping AudioClip to fill with the received audio data
  int bufferSamples = (int)(sampleRate * channelCount);
  AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount,
                                    sampleRate, true, OnAudioRead);

  // Attach the clip to an AudioSource and start playing it
  AudioSource audioSource = GetComponent();
  audioSource.clip = clip;
  audioSource.loop = true;
  audioSource.Play();

  // Start receiving the response
  await foreach (var message in liveSession.ReceiveAsync()) {
    // Process the received message
    foreach (float[] pcmData in message.AudioAsFloat) {
      lock (audioBuffer) {
        foreach (float sample in pcmData) {
          audioBuffer.Enqueue(sample);
        }
      }
    }
  }
}

// This method is called by the AudioClip to load audio data.
private void OnAudioRead(float[] data) {
  int samplesToProvide = data.Length;
  int samplesProvided = 0;

  lock(audioBuffer) {
    while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) {
      data[samplesProvided] = audioBuffer.Dequeue();
      samplesProvided++;
    }
  }

  while (samplesProvided < samplesToProvide) {
    data[samplesProvided] = 0.0f;
    samplesProvided++;
  }
}



價格和權杖計數

如要瞭解 Live API 模型的價格資訊,請參閱所選Gemini API供應商的說明文件: Gemini Developer API | Vertex AI Gemini API

無論您使用哪家 Gemini API 提供者,Live API 支援 Count Tokens API。



你還能做些什麼?

  • 請參閱 Live API完整功能,例如串流各種輸入模式 (音訊、文字或影片 + 音訊)。

  • 使用各種設定選項自訂實作方式,例如新增轉錄內容或設定回覆語音。

  • 讓模型存取函式呼叫和 Google 搜尋等工具,進一步提升實作成效。我們即將推出官方文件,說明如何搭配 Live API 使用工具!

  • 瞭解使用 Live API限制和規格,例如工作階段長度、速率限制、支援的語言等。