The latest Gemini models, like Gemini 3.5 Flash, are available to use with Firebase AI Logic! Learn more.

All Imagen models will shut down as early as June 30, 2026. Learn about migrating your apps to use Nano Banana.

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Các lựa chọn cấu hình cho Live API

Ngay cả khi triển khai cơ bản cho Live API, bạn vẫn có thể tạo ra những lượt tương tác hấp dẫn và hiệu quả cho người dùng. Bạn có thể tuỳ chỉnh trải nghiệm hơn nữa bằng cách sử dụng các lựa chọn cấu hình sau:

Ngôn ngữ và giọng nói của câu trả lời
Bản chép lời cho đầu vào và đầu ra âm thanh
Phát hiện hoạt động giọng nói (VAD)
Quản lý phiên

Giọng nói và ngôn ngữ của câu trả lời

Bạn có thể yêu cầu mô hình trả lời bằng một giọng nói cụ thể và yêu cầu mô hình trả lời bằng nhiều ngôn ngữ.

Chỉ định giọng nói phản hồi

Nhấp vào nhà cung cấp Gemini API để xem nội dung và mã dành riêng cho nhà cung cấp trên trang này.

Live API sử dụng Chirp 3 để hỗ trợ các phản hồi bằng lời nói được tạo bằng giọng nói HD.

Nếu bạn không chỉ định giọng nói phản hồi, thì giọng nói mặc định sẽ là Puck.

Xem danh sách các lựa chọn phản hồi bằng giọng nói

Để xem bản minh hoạ âm thanh của từng giọng nói, hãy xem Chirp 3: Giọng nói chất lượng cao.

Zephyr — Tươi sáng
Kore — Chắc chắn
Orus — Chắc chắn
Autonoe — Tươi sáng
Umbriel — Dễ chịu
Erinome — Rõ ràng
Laomedeia — Sôi động
Schedar — Đều đặn
Achird — Thân thiện
Sadachbia — Sôi nổi Puck – Vui vẻ
Fenrir – Hào hứng
Aoede – Thoải mái
Enceladus – Nhẹ nhàng
Algieba – Êm ái
Algenib – Khàn khàn
Achernar – Nhẹ nhàng
Gacrux – Trưởng thành
Zubenelgenubi – Bình dị
Sadaltager – Hiểu biết Charon – Nhiều thông tin
Leda – Trẻ trung
Callirrhoe – Dễ tính
Iapetus – Rõ ràng
Despina – Nhẹ nhàng
Rasalgethi – Nhiều thông tin
Alnilam – Chắc chắn
Pulcherrima – Tiến bộ
Vindemiatrix – Dịu dàng
Sulafat – Ấm áp

Để chỉ định giọng nói phản hồi, hãy đặt tên giọng nói trong đối tượng speechConfig trong phần cấu hình mô hình.

Swift


// ...

let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to use a specific voice for its audio response
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    speech: SpeechConfig(voiceName: "VOICE_NAME")
  )
)

// ...

Kotlin


// ...

val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        speechConfig = SpeechConfig(voice = Voice("VOICE_NAME"))
    }
)

// ...

Java


// ...

LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    new LiveGenerationConfig.Builder()
        .setResponseModality(ResponseModality.AUDIO)
        .setSpeechConfig(new SpeechConfig(new Voice("VOICE_NAME")))
        .build()
);

// ...

Web


// ...

const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

const liveModel = getLiveGenerativeModel(ai, {
  model: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to use a specific voice for its audio response
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: { voiceName: "VOICE_NAME" },
      },
    },
  },
});

// ...

Dart


// ...

final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to use a specific voice for its audio response
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
    speechConfig: SpeechConfig(voiceName: 'VOICE_NAME'),
  ),
);

// ...

Unity


// ...

var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio },
        speechConfig: SpeechConfig.UsePrebuiltVoice("VOICE_NAME")
    )
);

// ...

Ảnh hưởng đến ngôn ngữ của câu trả lời

Các mô hình Live API sẽ tự động chọn ngôn ngữ phù hợp cho câu trả lời của mình.

Xem danh sách các ngôn ngữ được hỗ trợ

Ngôn ngữ	Mã BCP-47	Ngôn ngữ	Mã BCP-47
Tiếng Ả Rập (Ai Cập)	ar-EG	Tiếng Đức (Đức)	de-DE
Tiếng Anh (Mỹ)	en-US	Tiếng Tây Ban Nha (Mỹ)	es-US
Tiếng Pháp (Pháp)	fr-FR	Tiếng Hindi (Ấn Độ)	hi-IN
Tiếng Indonesia (Indonesia)	id-ID	Tiếng Ý (Ý)	it-IT
Tiếng Nhật (Nhật Bản)	ja-JP	Tiếng Hàn (Hàn Quốc)	ko-KR
Tiếng Bồ Đào Nha (Brazil)	pt-BR	Tiếng Nga (Nga)	ru-RU
Tiếng Hà Lan (Hà Lan)	nl-NL	Tiếng Ba Lan (Ba Lan)	pl-PL
Tiếng Thái (Thái Lan)	th-TH	Tiếng Thổ Nhĩ Kỳ (Thổ Nhĩ Kỳ)	tr-TR
Tiếng Việt (Việt Nam)	vi-VN	Tiếng Rumani (Rumani)	ro-RO
Tiếng Ukraina (Ukraina)	uk-UA	Tiếng Bengali (Bangladesh)	bn-BD
Tiếng Anh (Ấn Độ)	Gói en-IN và hi-IN	Tiếng Marathi (Ấn Độ)	mr-IN
Tiếng Tamil (Ấn Độ)	ta-IN	Tiếng Telugu (Ấn Độ)	te-IN

Nếu muốn mô hình phản hồi bằng một ngôn ngữ không phải tiếng Anh hoặc luôn phản hồi bằng một ngôn ngữ cụ thể, bạn có thể tác động đến câu trả lời của mô hình bằng cách sử dụng hướng dẫn hệ thống như các ví dụ sau:

Củng cố cho mô hình rằng một ngôn ngữ không phải tiếng Anh có thể phù hợp

Listen to the speaker carefully. If you detect a non-English language, respond
in the language you hear from the speaker. You must respond unmistakably in the
speaker's language.

Yêu cầu mô hình luôn phản hồi bằng một ngôn ngữ cụ thể
```
RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.
```

Bản chép lời cho đầu vào và đầu ra âm thanh

Nhấp vào nhà cung cấp Gemini API để xem nội dung và mã dành riêng cho nhà cung cấp trên trang này.

Trong phần phản hồi của mô hình, bạn có thể nhận được bản chép lời của dữ liệu đầu vào âm thanh và phản hồi âm thanh của mô hình. Bạn thiết lập cấu hình này trong phần cấu hình mô hình.

Để chép lời nội dung âm thanh đầu vào, hãy thêm inputAudioTranscription.
Để chép lời câu trả lời bằng âm thanh của mô hình, hãy thêm outputAudioTranscription.

Lưu ý những điều sau:

Bạn có thể định cấu hình mô hình để trả về bản chép lời của cả đầu vào và đầu ra (như trong ví dụ sau), hoặc bạn có thể định cấu hình mô hình để chỉ trả về một trong hai.
Bản chép lời được truyền trực tuyến cùng với âm thanh, vì vậy, tốt nhất là bạn nên thu thập bản chép lời giống như cách bạn thu thập các phần văn bản trong mỗi lượt.
Ngôn ngữ chép lời được suy đoán dựa trên đầu vào âm thanh và phản hồi âm thanh của mô hình.

Swift


// ...

let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to return transcriptions of the audio input and output
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    inputAudioTranscription: AudioTranscriptionConfig(),
    outputAudioTranscription: AudioTranscriptionConfig()
  )
)

var inputTranscript: String = ""
var outputTranscript: String = ""

do {
  let session = try await liveModel.connect()
  for try await response in session.responses {
    if case let .content(content) = response.payload {
      if let inputText = content.inputAudioTranscription?.text {
        // Handle transcription text of the audio input
        inputTranscript += inputText
      }

      if let outputText = content.outputAudioTranscription?.text {
        // Handle transcription text of the audio output
        outputTranscript += outputText
      }

      if content.isTurnComplete {
        // Log the transcripts after the current turn is complete
        print("Input audio: \(inputTranscript)")
        print("Output audio: \(outputTranscript)")

        // Reset the transcripts for the next turn
        inputTranscript = ""
        outputTranscript = ""
      }
    }
  }


} catch {
  // Handle error
}

// ...

Kotlin


// ...

val liveModel = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        inputAudioTranscription = AudioTranscriptionConfig()
        outputAudioTranscription = AudioTranscriptionConfig()
   }
)

val liveSession = liveModel.connect()

fun handleTranscription(input: Transcription?, output: Transcription?) {
    input?.text?.let { text ->
        // Handle transcription text of the audio input
        println("Input Transcription: $text")
    }
    output?.text?.let { text ->
        // Handle transcription text of the audio output
        println("Output Transcription: $text")
    }
}

liveSession.startAudioConversation(null, ::handleTranscription)

// ...

Java


// ...

ExecutorService executor = Executors.newFixedThreadPool(1);

LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    new LiveGenerationConfig.Builder()
            .setResponseModality(ResponseModality.AUDIO)
            .setInputAudioTranscription(new AudioTranscriptionConfig())
            .setOutputAudioTranscription(new AudioTranscriptionConfig())
            .build()
    );

LiveModelFutures liveModel = LiveModelFutures.from(lm);
ListenableFuture sessionFuture = liveModel.connect();

Futures.addCallback(sessionFuture, new FutureCallback() {
    @Override
    public void onSuccess(LiveSessionFutures ses) {
        LiveSessionFutures session = ses;
        session.startAudioConversation((Transcription input, Transcription output) -> {
            if (input != null) {
                // Handle transcription text of the audio input
                System.out.println("Input Transcription: " + input.getText());
            }
            if (output != null) {
                // Handle transcription text of the audio output
                System.out.println("Output Transcription: " + output.getText());
            }
            return null;
        });
    }

    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
        t.printStackTrace();
    }
}, executor);

// ...

Web


// ...

const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

const liveModel = getLiveGenerativeModel(ai, {
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to return transcriptions of the audio input and output
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    inputAudioTranscription: {},
    outputAudioTranscription: {},
  },
});

const liveSession = await liveModel.connect();

liveSession.sendAudioRealtime({ data, mimeType: "audio/pcm" });

const messages = liveSession.receive();
for await (const message of messages) {
  switch (message.type) {
    case 'serverContent':
      if (message.inputTranscription) {
        // Handle transcription text of the audio input
        console.log(`Input transcription: ${message.inputTranscription.text}`);
      }
      if (message.outputTranscription) {
        // Handle transcription text of the audio output
        console.log(`Output transcription: ${message.outputTranscription.text}`);
      } else {
      	 // Handle other message types (modelTurn, turnComplete, interruption)
      }
    default:
      // Handle other message types (toolCall, toolCallCancellation)
  }
}

// ...

Dart


// ...

final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to return transcriptions of the audio input and output
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
    inputAudioTranscription: AudioTranscriptionConfig(),
    outputAudioTranscription: AudioTranscriptionConfig(),
  ),
);

final LiveSession _session = _liveModel.connect();

await for (final response in _session.receive()) {
  LiveServerContent message = response.message;
  if (message.inputTranscription?.text case final inputText?) {
    // Handle transcription text of the audio input
    print('Input: $inputText');
  }

  if (message.outputTranscription?.text case final outputText?) {
    // Handle transcription text of the audio output
    print('Output: $outputText');
  }
}

// ...

Unity


// ...

var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio },
        inputAudioTranscription: new AudioTranscriptionConfig(),
        outputAudioTranscription: new AudioTranscriptionConfig()
    )
);

try
{
    var session = await liveModel.ConnectAsync();
    var stream = session.ReceiveAsync();
    await foreach (var response in stream) {
        if (response.Message is LiveSessionContent sessionContent) {
            if (!string.IsNullOrEmpty(sessionContent.InputTranscription?.Text)) {
              // handle transcription text of input audio
            }

            if (!string.IsNullOrEmpty(sessionContent.OutputTranscription?.Text)) {
              // handle transcription text of output audio
            }
        }
    }
}
catch (Exception e)
{
    // Handle error
}

// ...

Phát hiện hoạt động giọng nói (VAD)

Mô hình này tự động thực hiện tính năng phát hiện hoạt động giọng nói (VAD) trên luồng đầu vào âm thanh liên tục. VAD được bật theo mặc định.

Quản lý phiên

Tìm hiểu về các chủ đề liên quan đến phiên sau đây:

Các chức năng nâng cao, bao gồm:
- Cập nhật chỉ dẫn hệ thống trong phiên
- Thêm nội dung cập nhật gia tăng
Giới hạn liên quan đến phiên, bao gồm giới hạn về kết nối và thời lượng phiên, giới hạn về cửa sổ ngữ cảnh của phiên và giới hạn về tốc độ.
Các lựa chọn để xử lý giới hạn phiên, bao gồm:
- Nén cửa sổ ngữ cảnh
- Tiếp tục một phiên

Các lựa chọn cấu hình cho Live API Sử dụng bộ sưu tập để sắp xếp ngăn nắp các trang Lưu và phân loại nội dung dựa trên lựa chọn ưu tiên của bạn.

Giọng nói và ngôn ngữ của câu trả lời

Chỉ định giọng nói phản hồi

Swift

Kotlin

Java

Web

Dart

Unity

Ảnh hưởng đến ngôn ngữ của câu trả lời

Bản chép lời cho đầu vào và đầu ra âm thanh

Swift

Kotlin

Java

Web

Dart

Unity

Phát hiện hoạt động giọng nói (VAD)

Quản lý phiên

Các lựa chọn cấu hình cho Live API