Firebase AI Logic supports Gemini 3 Pro and Gemini 3 Pro Image (nano banana pro) for use on all platforms (in preview).

このページは Cloud Translation API によって翻訳されました。

Live API の構成オプション

Live API の基本的な実装でも、ユーザー向けの魅力的で強力なインタラクションを構築できます。必要に応じて、次の構成オプションを使用してエクスペリエンスをさらにカスタマイズできます。

回答の音声と言語
音声入出力の文字起こし
音声アクティビティ検出（VAD）
セッション管理

レスポンスの音声と言語

モデルに特定の音声で応答させたり、さまざまな言語で応答させたりできます。

レスポンスの音声を指定する

Gemini API プロバイダをクリックして、このページでプロバイダ固有のコンテンツとコードを表示します。

Live API は Chirp 3 を使用して、HD 音声で合成音声レスポンスをサポートしています。

レスポンス音声が指定されていない場合、デフォルトは Puck です。

応答音声オプションのリストを表示する

各音声のデモについては、Chirp 3: HD 音声をご覧ください。

Zephyr -- 明るい
Kore -- しっかりした
Orus -- しっかりした
Autonoe -- 明るい
Umbriel -- のんびりした
Erinome -- クリア
Laomedeia -- アップビート
Schedar -- 均等
Achird -- フレンドリー
Sadachbia -- 活気のある Puck -- Upbeat
Fenrir -- Excitable
Aoede -- Breezy
Enceladus -- Breathy
Algieba -- Smooth
Algenib -- Gravelly
Achernar -- Soft
Gacrux -- Mature
Zubenelgenubi -- Casual
Sadaltager -- Knowledgeable Charon -- Informative
Leda -- Youthful
Callirrhoe -- Easy-going
Iapetus -- Clear
Despina -- Smooth
Rasalgethi -- Informative
Alnilam -- Firm
Pulcherrima -- Forward
Vindemiatrix -- Gentle
Sulafat -- Warm

レスポンスの音声を指定するには、モデル構成の一部として、speechConfig オブジェクト内に音声名を設定します。

Swift


// ...

let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-09-2025",
  // Configure the model to use a specific voice for its audio response
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    speech: SpeechConfig(voiceName: "VOICE_NAME")
  )
)

// ...

Kotlin


// ...

val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-09-2025",
    // Configure the model to use a specific voice for its audio response
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        speechConfig = SpeechConfig(voice = Voice("VOICE_NAME"))
    }
)

// ...

Java


// ...

LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.5-flash-native-audio-preview-09-2025",
    // Configure the model to use a specific voice for its audio response
    new LiveGenerationConfig.Builder()
        .setResponseModality(ResponseModality.AUDIO)
        .setSpeechConfig(new SpeechConfig(new Voice("VOICE_NAME")))
        .build()
);

// ...

Web


// ...

const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

const liveModel = getLiveGenerativeModel(ai, {
  model: "gemini-2.5-flash-native-audio-preview-09-2025",
  // Configure the model to use a specific voice for its audio response
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: { voiceName: "VOICE_NAME" },
      },
    },
  },
});

// ...

Dart


// ...

final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-09-2025',
  // Configure the model to use a specific voice for its audio response
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
    speechConfig: SpeechConfig(voiceName: 'VOICE_NAME'),
  ),
);

// ...

Unity


// ...

var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.5-flash-native-audio-preview-09-2025",
    // Configure the model to use a specific voice for its audio response
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio },
        speechConfig: SpeechConfig.UsePrebuiltVoice("VOICE_NAME")
    )
);

// ...

レスポンスの言語に影響を与える

Live API モデルは、レスポンスに適した言語を自動的に選択します。

サポートされている言語の一覧を表示する

言語	BCP-47 コード	言語	BCP-47 コード
アラビア語（エジプト）	ar-EG	ドイツ語（ドイツ）	de-DE
英語 (アメリカ)	en-US	スペイン語（米国）	es-US
フランス語（フランス）	fr-FR	ヒンディー語（インド）	hi-IN
インドネシア語（インドネシア）	id-ID	イタリア語（イタリア）	it-IT
日本語（日本）	ja-JP	韓国語（韓国）	ko-KR
ポルトガル語 (ブラジル)	pt-BR	ロシア語（ロシア）	ru-RU
オランダ語（オランダ）	nl-NL	ポーランド語（ポーランド）	pl-PL
タイ語（タイ）	th-TH	トルコ語（トルコ）	tr-TR
ベトナム語（ベトナム）	vi-VN	ルーマニア語（ルーマニア）	ro-RO
ウクライナ語（ウクライナ）	uk-UA	ベンガル語（バングラデシュ）	bn-BD
英語（インド）	en-IN と hi-IN のバンドル	マラーティー語（インド）	mr-IN
タミル語（インド）	ta-IN	テルグ語（インド）	te-IN

モデルが英語以外の言語で応答するようにする場合や、常に特定の言語で応答するようにする場合は、次の例のようにシステム指示を使用してモデルのレスポンスに影響を与えることができます。

英語以外の言語が適切である可能性があることをモデルに認識させる

Listen to the speaker carefully. If you detect a non-English language, respond
in the language you hear from the speaker. You must respond unmistakably in the
speaker's language.

特定の言語で常に回答するようにモデルに指示する
```
RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.
```

音声の入出力の文字起こし

Gemini API プロバイダをクリックして、このページでプロバイダ固有のコンテンツとコードを表示します。

モデルのレスポンスの一部として、音声入力とモデルの音声レスポンスの文字起こしを受け取ることができます。この構成は、モデル構成の一部として設定します。

音声入力の文字起こしを行う場合は、inputAudioTranscription を追加します。
モデルの音声レスポンスの文字起こしを行う場合は、outputAudioTranscription を追加します。

次の点にご注意ください。

入力と出力の両方の文字起こしを返すようにモデルを構成することも（次の例を参照）、どちらか一方のみを返すように構成することもできます。
文字起こしは音声とともにストリーミングされるため、各ターンのテキスト部分と同様に収集することをおすすめします。
音声文字変換の言語は、音声入力とモデルの音声レスポンスから推測されます。

Swift


// ...

let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-09-2025",
  // Configure the model to return transcriptions of the audio input and output
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    inputAudioTranscription: AudioTranscriptionConfig(),
    outputAudioTranscription: AudioTranscriptionConfig()
  )
)

var inputTranscript: String = ""
var outputTranscript: String = ""

do {
  let session = try await liveModel.connect()
  for try await response in session.responses {
    if case let .content(content) = response.payload {
      if let inputText = content.inputAudioTranscription?.text {
        // Handle transcription text of the audio input
        inputTranscript += inputText
      }

      if let outputText = content.outputAudioTranscription?.text {
        // Handle transcription text of the audio output
        outputTranscript += outputText
      }

      if content.isTurnComplete {
        // Log the transcripts after the current turn is complete
        print("Input audio: \(inputTranscript)")
        print("Output audio: \(outputTranscript)")

        // Reset the transcripts for the next turn
        inputTranscript = ""
        outputTranscript = ""
      }
    }
  }


} catch {
  // Handle error
}

// ...

Kotlin


// ...

val liveModel = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-09-2025",
    // Configure the model to return transcriptions of the audio input and output
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        inputAudioTranscription = AudioTranscriptionConfig()
        outputAudioTranscription = AudioTranscriptionConfig()
   }
)

val liveSession = liveModel.connect()

fun handleTranscription(input: Transcription?, output: Transcription?) {
    input?.text?.let { text ->
        // Handle transcription text of the audio input
        println("Input Transcription: $text")
    }
    output?.text?.let { text ->
        // Handle transcription text of the audio output
        println("Output Transcription: $text")
    }
}

liveSession.startAudioConversation(null, ::handleTranscription)

// ...

Java


// ...

ExecutorService executor = Executors.newFixedThreadPool(1);

LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.5-flash-native-audio-preview-09-2025",
    // Configure the model to return transcriptions of the audio input and output
    new LiveGenerationConfig.Builder()
            .setResponseModality(ResponseModality.AUDIO)
            .setInputAudioTranscription(new AudioTranscriptionConfig())
            .setOutputAudioTranscription(new AudioTranscriptionConfig())
            .build()
    );

LiveModelFutures liveModel = LiveModelFutures.from(lm);
ListenableFuture sessionFuture = liveModel.connect();

Futures.addCallback(sessionFuture, new FutureCallback() {
    @Override
    public void onSuccess(LiveSessionFutures ses) {
        LiveSessionFutures session = ses;
        session.startAudioConversation((Transcription input, Transcription output) -> {
            if (input != null) {
                // Handle transcription text of the audio input
                System.out.println("Input Transcription: " + input.getText());
            }
            if (output != null) {
                // Handle transcription text of the audio output
                System.out.println("Output Transcription: " + output.getText());
            }
            return null;
        });
    }

    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
        t.printStackTrace();
    }
}, executor);

// ...

Web


// ...

const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

const liveModel = getLiveGenerativeModel(ai, {
  model: 'gemini-2.5-flash-native-audio-preview-09-2025',
  // Configure the model to return transcriptions of the audio input and output
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    inputAudioTranscription: {},
    outputAudioTranscription: {},
  },
});

const liveSession = await liveModel.connect();

liveSession.sendAudioRealtime({ data, mimeType: "audio/pcm" });

const messages = liveSession.receive();
for await (const message of messages) {
  switch (message.type) {
    case 'serverContent':
      if (message.inputTranscription) {
        // Handle transcription text of the audio input
        console.log(`Input transcription: ${message.inputTranscription.text}`);
      }
      if (message.outputTranscription) {
        // Handle transcription text of the audio output
        console.log(`Output transcription: ${message.outputTranscription.text}`);
      } else {
      	 // Handle other message types (modelTurn, turnComplete, interruption)
      }
    default:
      // Handle other message types (toolCall, toolCallCancellation)
  }
}

// ...

Dart


// ...

final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-09-2025',
  // Configure the model to return transcriptions of the audio input and output
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
    inputAudioTranscription: AudioTranscriptionConfig(),
    outputAudioTranscription: AudioTranscriptionConfig(),
  ),
);

final LiveSession _session = _liveModel.connect();

await for (final response in _session.receive()) {
  LiveServerContent message = response.message;
  if (message.inputTranscription?.text case final inputText?) {
    // Handle transcription text of the audio input
    print('Input: $inputText');
  }

  if (message.outputTranscription?.text case final outputText?) {
    // Handle transcription text of the audio output
    print('Output: $outputText');
  }
}

// ...

Unity


// ...

var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.5-flash-native-audio-preview-09-2025",
    // Configure the model to return transcriptions of the audio input and output
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio },
        inputAudioTranscription: new AudioTranscriptionConfig(),
        outputAudioTranscription: new AudioTranscriptionConfig()
    )
);

try
{
    var session = await liveModel.ConnectAsync();
    var stream = session.ReceiveAsync();
    await foreach (var response in stream) {
        if (response.Message is LiveSessionContent sessionContent) {
            if (!string.IsNullOrEmpty(sessionContent.InputTranscription?.Text)) {
              // handle transcription text of input audio
            }

            if (!string.IsNullOrEmpty(sessionContent.OutputTranscription?.Text)) {
              // handle transcription text of output audio
            }
        }
    }
}
catch (Exception e)
{
    // Handle error
}

// ...

音声アクティビティ検出（VAD）

モデルは、連続した音声入力ストリームに対して、音声アクティビティ検出（VAD）を自動的に実行します。VAD はデフォルトで有効になっています。

セッション管理

セッションに関連する次のトピックについて説明します。
- 高度な機能:
  - セッション中にシステム指示を更新する
  - コンテンツの増分更新を追加する
- 接続とセッションの長さの上限、セッションコンテキストウィンドウの上限、レート上限など、セッション関連の上限。
Firebase AI Logic は、セッション管理の次の機能をまだサポートしていません。しばらくしてからもう一度ご確認ください。
- 割り込みの処理
- セッションの長さを延長する
- セッションを再開する
- セッションとリクエスト間でコンテキストを維持する
- コンテキストウィンドウの圧縮

Live API の構成オプション コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

レスポンスの音声と言語

レスポンスの音声を指定する

Swift

Kotlin

Java

Web

Dart

Unity

レスポンスの言語に影響を与える

音声の入出力の文字起こし

Swift

Kotlin

Java

Web

Dart

Unity

音声アクティビティ検出（VAD）

セッション管理

Live API の構成オプション