即使是 Live API 的基本实现,您也可以为用户构建富有吸引力且强大的互动。 您还可以使用以下配置选项进一步自定义体验:
回答语音和语言
指定回答语音
|
点击您的 Gemini API 提供商,以查看此页面上特定于提供商的内容和代码。 |
Live API 使用 Chirp 3 来支持采用高清语音的合成语音回答。
如果您未指定回答语音,则默认为 Puck。
如需指定回答语音,请在 speechConfig 对象中设置语音名称,作为模型配置的一部分。
Swift
// ...
let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
modelName: "gemini-2.5-flash-native-audio-preview-09-2025",
// Configure the model to use a specific voice for its audio response
generationConfig: LiveGenerationConfig(
responseModalities: [.audio],
speech: SpeechConfig(voiceName: "VOICE_NAME")
)
)
// ...
Kotlin
// ...
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
modelName = "gemini-2.5-flash-native-audio-preview-09-2025",
// Configure the model to use a specific voice for its audio response
generationConfig = liveGenerationConfig {
responseModality = ResponseModality.AUDIO
speechConfig = SpeechConfig(voice = Voice("VOICE_NAME"))
}
)
// ...
Java
// ...
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
"gemini-2.5-flash-native-audio-preview-09-2025",
// Configure the model to use a specific voice for its audio response
new LiveGenerationConfig.Builder()
.setResponseModality(ResponseModality.AUDIO)
.setSpeechConfig(new SpeechConfig(new Voice("VOICE_NAME")))
.build()
);
// ...
Web
// ...
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });
const liveModel = getLiveGenerativeModel(ai, {
model: "gemini-2.5-flash-native-audio-preview-09-2025",
// Configure the model to use a specific voice for its audio response
generationConfig: {
responseModalities: [ResponseModality.AUDIO],
speechConfig: {
voiceConfig: {
prebuiltVoiceConfig: { voiceName: "VOICE_NAME" },
},
},
},
});
// ...
Dart
// ...
final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
model: 'gemini-2.5-flash-native-audio-preview-09-2025',
// Configure the model to use a specific voice for its audio response
liveGenerationConfig: LiveGenerationConfig(
responseModalities: [ResponseModalities.audio],
speechConfig: SpeechConfig(voiceName: 'VOICE_NAME'),
),
);
// ...
Unity
// ...
var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
modelName: "gemini-2.5-flash-native-audio-preview-09-2025",
// Configure the model to use a specific voice for its audio response
liveGenerationConfig: new LiveGenerationConfig(
responseModalities: new[] { ResponseModality.Audio },
speechConfig: SpeechConfig.UsePrebuiltVoice("VOICE_NAME")
)
);
// ...
影响回答语言
Live API 模型会自动选择合适的语言来生成回答。
如果您希望模型以非英语语言回答,或者始终以特定语言回答,可以使用系统指令来影响模型的回答,如以下示例所示:
向模型强调非英语语言可能适合
Listen to the speaker carefully. If you detect a non-English language, respond in the language you hear from the speaker. You must respond unmistakably in the speaker's language.让模型始终以特定语言回答
RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.
音频输入和输出的转写
|
点击您的 Gemini API 提供商,以查看此页面上特定于提供商的内容和代码。 |
作为模型回答的一部分,您可以收到音频输入和模型音频回答的转写内容。您可以将此配置设置为模型配置的一部分。
如需转写音频输入,请添加
inputAudioTranscription。如需转写模型的音频回答,请添加
outputAudioTranscription。
请注意以下几点:
您可以将模型配置为返回输入和输出的转写内容(如以下示例所示),也可以将其配置为仅返回其中一种。
转写内容会与音频一起进行流式传输,因此最好像收集每次对话的文本部分一样收集转写内容。
转写语言是从音频输入和模型的音频回答中推断出来的。
Swift
// ...
let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
modelName: "gemini-2.5-flash-native-audio-preview-09-2025",
// Configure the model to return transcriptions of the audio input and output
generationConfig: LiveGenerationConfig(
responseModalities: [.audio],
inputAudioTranscription: AudioTranscriptionConfig(),
outputAudioTranscription: AudioTranscriptionConfig()
)
)
var inputTranscript: String = ""
var outputTranscript: String = ""
do {
let session = try await liveModel.connect()
for try await response in session.responses {
if case let .content(content) = response.payload {
if let inputText = content.inputAudioTranscription?.text {
// Handle transcription text of the audio input
inputTranscript += inputText
}
if let outputText = content.outputAudioTranscription?.text {
// Handle transcription text of the audio output
outputTranscript += outputText
}
if content.isTurnComplete {
// Log the transcripts after the current turn is complete
print("Input audio: \(inputTranscript)")
print("Output audio: \(outputTranscript)")
// Reset the transcripts for the next turn
inputTranscript = ""
outputTranscript = ""
}
}
}
} catch {
// Handle error
}
// ...
Kotlin
// ...
val liveModel = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
modelName = "gemini-2.5-flash-native-audio-preview-09-2025",
// Configure the model to return transcriptions of the audio input and output
generationConfig = liveGenerationConfig {
responseModality = ResponseModality.AUDIO
inputAudioTranscription = AudioTranscriptionConfig()
outputAudioTranscription = AudioTranscriptionConfig()
}
)
val liveSession = liveModel.connect()
fun handleTranscription(input: Transcription?, output: Transcription?) {
input?.text?.let { text ->
// Handle transcription text of the audio input
println("Input Transcription: $text")
}
output?.text?.let { text ->
// Handle transcription text of the audio output
println("Output Transcription: $text")
}
}
liveSession.startAudioConversation(null, ::handleTranscription)
// ...
Java
// ...
ExecutorService executor = Executors.newFixedThreadPool(1);
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
"gemini-2.5-flash-native-audio-preview-09-2025",
// Configure the model to return transcriptions of the audio input and output
new LiveGenerationConfig.Builder()
.setResponseModality(ResponseModality.AUDIO)
.setInputAudioTranscription(new AudioTranscriptionConfig())
.setOutputAudioTranscription(new AudioTranscriptionConfig())
.build()
);
LiveModelFutures liveModel = LiveModelFutures.from(lm);
ListenableFuture sessionFuture = liveModel.connect();
Futures.addCallback(sessionFuture, new FutureCallback() {
@Override
public void onSuccess(LiveSessionFutures ses) {
LiveSessionFutures session = ses;
session.startAudioConversation((Transcription input, Transcription output) -> {
if (input != null) {
// Handle transcription text of the audio input
System.out.println("Input Transcription: " + input.getText());
}
if (output != null) {
// Handle transcription text of the audio output
System.out.println("Output Transcription: " + output.getText());
}
return null;
});
}
@Override
public void onFailure(Throwable t) {
// Handle exceptions
t.printStackTrace();
}
}, executor);
// ...
Web
// ...
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });
const liveModel = getLiveGenerativeModel(ai, {
model: 'gemini-2.5-flash-native-audio-preview-09-2025',
// Configure the model to return transcriptions of the audio input and output
generationConfig: {
responseModalities: [ResponseModality.AUDIO],
inputAudioTranscription: {},
outputAudioTranscription: {},
},
});
const liveSession = await liveModel.connect();
liveSession.sendAudioRealtime({ data, mimeType: "audio/pcm" });
const messages = liveSession.receive();
for await (const message of messages) {
switch (message.type) {
case 'serverContent':
if (message.inputTranscription) {
// Handle transcription text of the audio input
console.log(`Input transcription: ${message.inputTranscription.text}`);
}
if (message.outputTranscription) {
// Handle transcription text of the audio output
console.log(`Output transcription: ${message.outputTranscription.text}`);
} else {
// Handle other message types (modelTurn, turnComplete, interruption)
}
default:
// Handle other message types (toolCall, toolCallCancellation)
}
}
// ...
Dart
// ...
final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
model: 'gemini-2.5-flash-native-audio-preview-09-2025',
// Configure the model to return transcriptions of the audio input and output
liveGenerationConfig: LiveGenerationConfig(
responseModalities: [ResponseModalities.audio],
inputAudioTranscription: AudioTranscriptionConfig(),
outputAudioTranscription: AudioTranscriptionConfig(),
),
);
final LiveSession _session = _liveModel.connect();
await for (final response in _session.receive()) {
LiveServerContent message = response.message;
if (message.inputTranscription?.text case final inputText?) {
// Handle transcription text of the audio input
print('Input: $inputText');
}
if (message.outputTranscription?.text case final outputText?) {
// Handle transcription text of the audio output
print('Output: $outputText');
}
}
// ...
Unity
// ...
var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
modelName: "gemini-2.5-flash-native-audio-preview-09-2025",
// Configure the model to return transcriptions of the audio input and output
liveGenerationConfig: new LiveGenerationConfig(
responseModalities: new[] { ResponseModality.Audio },
inputAudioTranscription: new AudioTranscriptionConfig(),
outputAudioTranscription: new AudioTranscriptionConfig()
)
);
try
{
var session = await liveModel.ConnectAsync();
var stream = session.ReceiveAsync();
await foreach (var response in stream) {
if (response.Message is LiveSessionContent sessionContent) {
if (!string.IsNullOrEmpty(sessionContent.InputTranscription?.Text)) {
// handle transcription text of input audio
}
if (!string.IsNullOrEmpty(sessionContent.OutputTranscription?.Text)) {
// handle transcription text of output audio
}
}
}
}
catch (Exception e)
{
// Handle error
}
// ...
语音活动检测 (VAD)
模型会自动对连续的音频输入流执行语音活动检测 (VAD)。VAD 默认处于启用状态。
会话管理
了解以下与会话相关的主题:
高级功能,包括:
与会话相关的限制,包括连接和会话时长限制、会话上下文窗口限制和速率限制。
Firebase AI Logic 尚不支持以下会话管理功能。请过段时间再来查看!
- 处理中断
- 延长会话时长
- 恢复会话
- 在会话和请求之间保持上下文
- 压缩上下文窗口