The latest Gemini models, like Gemini 3.1 Flash Image (Nano Banana 2), are available to use with Firebase AI Logic on all platforms!

Gemini 2.0 Flash and Flash-Lite models will be retired on June 1, 2026. To avoid service disruption, update to a newer model like gemini-2.5-flash-lite. Also, Gemini 3 Pro Preview (gemini-3-pro-preview) will be retired on March 9, 2026 (update to Gemini 3.1 Pro Preview: gemini-3.1-pro-preview). Learn more.

ตัวเลือกการกำหนดค่าสำหรับ Live API

แม้จะมีการติดตั้งใช้งานพื้นฐานสำหรับ Live API คุณก็สามารถสร้างการโต้ตอบที่น่าสนใจและมีประสิทธิภาพสำหรับผู้ใช้ได้ คุณเลือกปรับแต่งประสบการณ์การใช้งานเพิ่มเติมได้โดยใช้ตัวเลือกการกำหนดค่าต่อไปนี้

เสียงและภาษาของคำตอบ
การถอดเสียงเป็นคำสำหรับอินพุตและเอาต์พุตเสียง
การตรวจหาการพูด (VAD)
การจัดการเซสชัน

เสียงและภาษาของคำตอบ

คุณสามารถทำให้โมเดลตอบด้วยเสียงที่เฉพาะเจาะจงและมีอิทธิพลต่อโมเดลให้ตอบเป็นภาษาต่างๆ ได้

ระบุเสียงตอบกลับ

คลิกผู้ให้บริการ Gemini API เพื่อดูเนื้อหาและรหัสเฉพาะของผู้ให้บริการ ในหน้านี้

Live API ใช้ Chirp 3 เพื่อรองรับการตอบกลับด้วยเสียงสังเคราะห์ในเสียง HD

หากไม่ได้ระบุเสียงตอบกลับ ค่าเริ่มต้นจะเป็น Puck

ดูรายการตัวเลือกเสียงตอบกลับ

หากต้องการดูตัวอย่างเสียงแต่ละเสียง โปรดดู Chirp 3: เสียง HD

Zephyr -- สดใส
Kore -- หนักแน่น
Orus -- หนักแน่น
Autonoe -- สดใส
Umbriel -- สบายๆ
Erinome -- ชัดเจน
Laomedeia -- สนุกสนาน
Schedar -- สม่ำเสมอ
Achird -- เป็นมิตร
Sadachbia -- มีชีวิตชีวา Puck -- ร่าเริง
Fenrir -- กระตือรือร้น
Aoede -- สบายๆ
Enceladus -- มีลม
Algieba -- นุ่มนวล
Algenib -- ห้าว
Achernar -- นุ่ม
Gacrux -- สุขุม
Zubenelgenubi -- เป็นกันเอง
Sadaltager -- มีความรู้ Charon -- ให้ข้อมูล
Leda -- กระตือรือร้น
Callirrhoe -- สบายๆ
Iapetus -- ชัดเจน
Despina -- ราบรื่น
Rasalgethi -- ให้ข้อมูล
Alnilam -- หนักแน่น
Pulcherrima -- ตรงไปตรงมา
Vindemiatrix -- นุ่มนวล
Sulafat -- อบอุ่น

หากต้องการระบุเสียงตอบกลับ ให้ตั้งค่าชื่อเสียงภายในออบเจ็กต์ speechConfig ซึ่งเป็นส่วนหนึ่งของการกำหนดค่าโมเดล

Swift


// ...

let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to use a specific voice for its audio response
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    speech: SpeechConfig(voiceName: "VOICE_NAME")
  )
)

// ...

Kotlin


// ...

val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        speechConfig = SpeechConfig(voice = Voice("VOICE_NAME"))
    }
)

// ...

Java


// ...

LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    new LiveGenerationConfig.Builder()
        .setResponseModality(ResponseModality.AUDIO)
        .setSpeechConfig(new SpeechConfig(new Voice("VOICE_NAME")))
        .build()
);

// ...

Web


// ...

const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

const liveModel = getLiveGenerativeModel(ai, {
  model: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to use a specific voice for its audio response
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: { voiceName: "VOICE_NAME" },
      },
    },
  },
});

// ...

Dart


// ...

final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to use a specific voice for its audio response
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
    speechConfig: SpeechConfig(voiceName: 'VOICE_NAME'),
  ),
);

// ...

Unity


// ...

var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio },
        speechConfig: SpeechConfig.UsePrebuiltVoice("VOICE_NAME")
    )
);

// ...

มีอิทธิพลต่อภาษาของคำตอบ

Live API โมเดลจะเลือกภาษาที่เหมาะสมสำหรับคำตอบโดยอัตโนมัติ

ดูรายการภาษาที่รองรับ

ภาษา	รหัส BCP-47	ภาษา	รหัส BCP-47
อาหรับ (อียิปต์)	ar-EG	เยอรมัน (เยอรมนี)	de-DE
อังกฤษ (อเมริกัน)	th-TH	สเปน (สหรัฐอเมริกา)	es-US
ฝรั่งเศส (ฝรั่งเศส)	fr-FR	ฮินดี (อินเดีย)	hi-IN
อินโดนีเซีย (อินโดนีเซีย)	id-ID	อิตาลี (อิตาลี)	it-IT
ญี่ปุ่น (ญี่ปุ่น)	ja-JP	เกาหลี (เกาหลี)	ko-KR
โปรตุเกส (บราซิล)	pt-BR	รัสเซีย (รัสเซีย)	ru-RU
ดัตช์ (เนเธอร์แลนด์)	nl-NL	โปแลนด์ (โปแลนด์)	pl-PL
ไทย (ไทย)	th-TH	ตุรกี (ตุรกี)	tr-TR
เวียดนาม (เวียดนาม)	vi-VN	โรมาเนีย (โรมาเนีย)	ro-RO
ยูเครน (ยูเครน)	uk-UA	เบงกาลี (บังคลาเทศ)	bn-BD
อังกฤษ (อินเดีย)	แพ็กเกจ en-IN และ hi-IN	มราฐี (อินเดีย)	mr-IN
ทมิฬ (อินเดีย)	ta-IN	เตลูกู (อินเดีย)	te-IN

หากต้องการให้โมเดลตอบกลับในภาษาอื่นที่ไม่ใช่ภาษาอังกฤษหรือในภาษาใดภาษาหนึ่งโดยเฉพาะ คุณสามารถใช้วิธีการของระบบเพื่อกำหนดคำตอบของโมเดลได้ เช่น ตัวอย่างต่อไปนี้

ย้ำกับโมเดลว่าภาษาที่ไม่ใช่ภาษาอังกฤษอาจเหมาะสม

Listen to the speaker carefully. If you detect a non-English language, respond
in the language you hear from the speaker. You must respond unmistakably in the
speaker's language.

บอกโมเดลให้ตอบกลับเป็นภาษาใดภาษาหนึ่งเสมอ
```
RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.
```

การถอดเสียงเป็นคำสำหรับอินพุตและเอาต์พุตเสียง

คุณจะได้รับข้อความถอดเสียงของ อินพุตเสียงและคำตอบเสียงของโมเดล ซึ่งเป็นส่วนหนึ่งของคำตอบของโมเดล คุณตั้งค่านี้เป็นส่วนหนึ่ง ของ การกำหนดค่าโมเดล

หากต้องการถอดเสียงอินพุตเสียง ให้เพิ่ม inputAudioTranscription
หากต้องการถอดเสียงคำตอบของโมเดล ให้เพิ่ม outputAudioTranscription

โปรดทราบดังต่อไปนี้

คุณสามารถกำหนดค่าโมเดลให้แสดงการถอดเสียงทั้งอินพุตและเอาต์พุต (ดังที่แสดงในตัวอย่างต่อไปนี้) หรือกำหนดค่าให้แสดงเฉพาะ อินพุตหรือเอาต์พุตอย่างใดอย่างหนึ่งก็ได้
ระบบจะสตรีมข้อความถอดเสียงพร้อมกับเสียง ดังนั้นจึงควรเก็บรวบรวมข้อความถอดเสียง เช่นเดียวกับส่วนข้อความในแต่ละรอบ
ระบบจะอนุมานภาษาของการถอดเสียงจากอินพุตเสียงและ การตอบกลับด้วยเสียงของโมเดล

Swift


// ...

let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to return transcriptions of the audio input and output
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    inputAudioTranscription: AudioTranscriptionConfig(),
    outputAudioTranscription: AudioTranscriptionConfig()
  )
)

var inputTranscript: String = ""
var outputTranscript: String = ""

do {
  let session = try await liveModel.connect()
  for try await response in session.responses {
    if case let .content(content) = response.payload {
      if let inputText = content.inputAudioTranscription?.text {
        // Handle transcription text of the audio input
        inputTranscript += inputText
      }

      if let outputText = content.outputAudioTranscription?.text {
        // Handle transcription text of the audio output
        outputTranscript += outputText
      }

      if content.isTurnComplete {
        // Log the transcripts after the current turn is complete
        print("Input audio: \(inputTranscript)")
        print("Output audio: \(outputTranscript)")

        // Reset the transcripts for the next turn
        inputTranscript = ""
        outputTranscript = ""
      }
    }
  }


} catch {
  // Handle error
}

// ...

Kotlin


// ...

val liveModel = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        inputAudioTranscription = AudioTranscriptionConfig()
        outputAudioTranscription = AudioTranscriptionConfig()
   }
)

val liveSession = liveModel.connect()

fun handleTranscription(input: Transcription?, output: Transcription?) {
    input?.text?.let { text ->
        // Handle transcription text of the audio input
        println("Input Transcription: $text")
    }
    output?.text?.let { text ->
        // Handle transcription text of the audio output
        println("Output Transcription: $text")
    }
}

liveSession.startAudioConversation(null, ::handleTranscription)

// ...

Java


// ...

ExecutorService executor = Executors.newFixedThreadPool(1);

LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    new LiveGenerationConfig.Builder()
            .setResponseModality(ResponseModality.AUDIO)
            .setInputAudioTranscription(new AudioTranscriptionConfig())
            .setOutputAudioTranscription(new AudioTranscriptionConfig())
            .build()
    );

LiveModelFutures liveModel = LiveModelFutures.from(lm);
ListenableFuture sessionFuture = liveModel.connect();

Futures.addCallback(sessionFuture, new FutureCallback() {
    @Override
    public void onSuccess(LiveSessionFutures ses) {
        LiveSessionFutures session = ses;
        session.startAudioConversation((Transcription input, Transcription output) -> {
            if (input != null) {
                // Handle transcription text of the audio input
                System.out.println("Input Transcription: " + input.getText());
            }
            if (output != null) {
                // Handle transcription text of the audio output
                System.out.println("Output Transcription: " + output.getText());
            }
            return null;
        });
    }

    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
        t.printStackTrace();
    }
}, executor);

// ...

Web


// ...

const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

const liveModel = getLiveGenerativeModel(ai, {
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to return transcriptions of the audio input and output
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    inputAudioTranscription: {},
    outputAudioTranscription: {},
  },
});

const liveSession = await liveModel.connect();

liveSession.sendAudioRealtime({ data, mimeType: "audio/pcm" });

const messages = liveSession.receive();
for await (const message of messages) {
  switch (message.type) {
    case 'serverContent':
      if (message.inputTranscription) {
        // Handle transcription text of the audio input
        console.log(`Input transcription: ${message.inputTranscription.text}`);
      }
      if (message.outputTranscription) {
        // Handle transcription text of the audio output
        console.log(`Output transcription: ${message.outputTranscription.text}`);
      } else {
      	 // Handle other message types (modelTurn, turnComplete, interruption)
      }
    default:
      // Handle other message types (toolCall, toolCallCancellation)
  }
}

// ...

Dart


// ...

final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to return transcriptions of the audio input and output
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
    inputAudioTranscription: AudioTranscriptionConfig(),
    outputAudioTranscription: AudioTranscriptionConfig(),
  ),
);

final LiveSession _session = _liveModel.connect();

await for (final response in _session.receive()) {
  LiveServerContent message = response.message;
  if (message.inputTranscription?.text case final inputText?) {
    // Handle transcription text of the audio input
    print('Input: $inputText');
  }

  if (message.outputTranscription?.text case final outputText?) {
    // Handle transcription text of the audio output
    print('Output: $outputText');
  }
}

// ...

Unity


// ...

var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio },
        inputAudioTranscription: new AudioTranscriptionConfig(),
        outputAudioTranscription: new AudioTranscriptionConfig()
    )
);

try
{
    var session = await liveModel.ConnectAsync();
    var stream = session.ReceiveAsync();
    await foreach (var response in stream) {
        if (response.Message is LiveSessionContent sessionContent) {
            if (!string.IsNullOrEmpty(sessionContent.InputTranscription?.Text)) {
              // handle transcription text of input audio
            }

            if (!string.IsNullOrEmpty(sessionContent.OutputTranscription?.Text)) {
              // handle transcription text of output audio
            }
        }
    }
}
catch (Exception e)
{
    // Handle error
}

// ...

การตรวจจับกิจกรรมเสียง (VAD)

โมเดลจะทำการตรวจหาการพูด (VAD) โดยอัตโนมัติในสตรีมอินพุตเสียงอย่างต่อเนื่อง VAD จะเปิดใช้อยู่โดยค่าเริ่มต้น

การจัดการเซสชัน

ดูข้อมูลเกี่ยวกับหัวข้อที่เกี่ยวข้องกับเซสชันต่อไปนี้
- ความสามารถขั้นสูง ได้แก่
  - การอัปเดตวิธีการของระบบกลางเซสชัน
  - การเพิ่มการอัปเดตเนื้อหาทีละน้อย
- ขีดจำกัดที่เกี่ยวข้องกับเซสชัน รวมถึงขีดจำกัดการเชื่อมต่อและความยาวของเซสชัน ขีดจำกัดหน้าต่างบริบทของเซสชัน และ ขีดจำกัดอัตรา
Firebase AI Logic ยังไม่รองรับฟีเจอร์ต่อไปนี้สำหรับการจัดการเซสชัน โปรดกลับมาใหม่หลังจากนี้
- การจัดการการหยุดชะงัก
- การขยายระยะเวลาของเซสชัน
- การดำเนินเซสชันต่อ
- การรักษาบริบทในเซสชันและคำขอ
- การบีบอัดหน้าต่างบริบท