The latest Gemini models, like Gemini 3.6 Flash, are available to use with Firebase AI Logic! Learn more.

All Imagen models will shut down as early as June 30, 2026. Learn about migrating your apps to use Nano Banana.

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Opcje konfiguracji interfejsu Live API

Nawet w przypadku podstawowej implementacji interfejsu Live API możesz tworzyć angażujące i zaawansowane interakcje dla użytkowników. Opcjonalnie możesz jeszcze bardziej dostosować wrażenia użytkowników, korzystając z tych opcji konfiguracji:

Głos i język odpowiedzi
Transkrypcje wejścia i wyjścia audio
Wykrywanie aktywności głosowej (VAD)
Zarządzanie sesją

Głos i język odpowiedzi

Możesz sprawić, że model będzie odpowiadać określonym głosem i wpłynąć na to, aby odpowiadał w różnych językach.

Określanie głosu odpowiedzi

Kliknij swojego dostawcę Gemini API, aby wyświetlić na tej stronie treści i kod specyficzne dla dostawcy.

Live API używa Chirp 3 do obsługi syntetyzowanych odpowiedzi głosowych w głosach HD.

Jeśli nie określisz głosu odpowiedzi, domyślnie używany jest głos Puck.

Wyświetl listę opcji głosu odpowiedzi

Przykłady brzmienia poszczególnych głosów znajdziesz w artykule Chirp 3: głosy HD.

Zephyr – jasny
Kore – stanowczy
Orus – stanowczy
Autonoe – jasny
Umbriel – spokojny
Erinome – wyraźny
Laomedeia – radosny
Schedar – równy
Achird – przyjazny
Sadachbia – żywy Puck – radosny
Fenrir – pobudliwy
Aoede – lekki
Enceladus – szeptliwy
Algieba – gładki
Algenib – chropawy
Achernar – cichy
Gacrux – dojrzały
Zubenelgenubi – swobodny
Sadaltager – wiedzący Charon – informacyjny
Leda – młodzieńczy
Callirrhoe – spokojny
Iapetus – wyraźny
Despina – gładki
Rasalgethi – informacyjny
Alnilam – stanowczy
Pulcherrima – bezpośredni
Vindemiatrix – łagodny
Sulafat – ciepły

Aby określić głos odpowiedzi, ustaw nazwę głosu w obiekcie speechConfig w ramach konfiguracji modelu.

Swift


// ...

let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to use a specific voice for its audio response
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    speech: SpeechConfig(voiceName: "VOICE_NAME")
  )
)

// ...

Kotlin


// ...

val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        speechConfig = SpeechConfig(voice = Voice("VOICE_NAME"))
    }
)

// ...

Java


// ...

LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    new LiveGenerationConfig.Builder()
        .setResponseModality(ResponseModality.AUDIO)
        .setSpeechConfig(new SpeechConfig(new Voice("VOICE_NAME")))
        .build()
);

// ...

Web


// ...

const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

const liveModel = getLiveGenerativeModel(ai, {
  model: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to use a specific voice for its audio response
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: { voiceName: "VOICE_NAME" },
      },
    },
  },
});

// ...

Dart


// ...

final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to use a specific voice for its audio response
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
    speechConfig: SpeechConfig(voiceName: 'VOICE_NAME'),
  ),
);

// ...

Unity


// ...

var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio },
        speechConfig: SpeechConfig.UsePrebuiltVoice("VOICE_NAME")
    )
);

// ...

Wpływanie na język odpowiedzi

Modele Live API automatycznie wybierają odpowiedni język swoich odpowiedzi.

Wyświetl listę obsługiwanych języków

Język	Kod BCP-47	Język	Kod BCP-47
arabski (egipski)	ar-EG	niemiecki (Niemcy)	de-DE
angielski (USA)	en-US	hiszpański (USA)	es-US
francuski (Francja)	fr-FR	hindi (Indie)	hi-IN
indonezyjski (Indonezja)	id-ID	włoski (Włochy)	it-IT
japoński (Japonia)	ja-JP	koreański (Korea)	ko-KR
portugalski (Brazylia)	pt-BR	rosyjski (Rosja)	ru-RU
niderlandzki (Holandia)	nl-NL	polski (Polska)	pl-PL
tajski (Tajlandia)	th-TH	turecki (Turcja)	tr-TR
wietnamski (Wietnam)	vi-VN	rumuński (Rumunia)	ro-RO
ukraiński (Ukraina)	uk-UA	bengalski (Bangladesz)	bn-BD
angielski (Indie)	pakiet en-IN i hi-IN	marathi (Indie)	mr-IN
tamilski (Indie)	ta-IN	telugu (Indie)	te-IN

Jeśli chcesz, aby model odpowiadał w języku innym niż angielski lub zawsze w określonym języku, możesz wpłynąć na jego odpowiedzi, używając instrukcji systemowych takich jak te przykłady:

Wzmocnij w modelu, że odpowiedni może być język inny niż angielski.

Listen to the speaker carefully. If you detect a non-English language, respond
in the language you hear from the speaker. You must respond unmistakably in the
speaker's language.

Poinformuj model, aby zawsze odpowiadał w określonym języku.

RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.

Transkrypcje wejścia i wyjścia audio

Kliknij swojego dostawcę Gemini API, aby wyświetlić na tej stronie treści i kod specyficzne dla dostawcy.

W ramach odpowiedzi modelu możesz otrzymywać transkrypcje wejścia audio i odpowiedzi audio modelu. Tę konfigurację ustawiasz w ramach konfiguracji modelu.

Aby uzyskać transkrypcję wejścia audio, dodaj inputAudioTranscription.
Aby uzyskać transkrypcję odpowiedzi audio modelu, dodaj outputAudioTranscription.

Pamiętaj:

Możesz skonfigurować model tak, aby zwracał transkrypcje zarówno wejścia, jak i wyjścia (jak pokazano w tym przykładzie), lub skonfigurować go tak, aby zwracał tylko jedno z nich.
Transkrypcje są przesyłane strumieniowo wraz z dźwiękiem, dlatego najlepiej jest je zbierać tak jak części tekstowe w każdej turze.
Język transkrypcji jest wywnioskowany z wejścia audio i odpowiedzi audio modelu.

Swift


// ...

let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to return transcriptions of the audio input and output
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    inputAudioTranscription: AudioTranscriptionConfig(),
    outputAudioTranscription: AudioTranscriptionConfig()
  )
)

var inputTranscript: String = ""
var outputTranscript: String = ""

do {
  let session = try await liveModel.connect()
  for try await response in session.responses {
    if case let .content(content) = response.payload {
      if let inputText = content.inputAudioTranscription?.text {
        // Handle transcription text of the audio input
        inputTranscript += inputText
      }

      if let outputText = content.outputAudioTranscription?.text {
        // Handle transcription text of the audio output
        outputTranscript += outputText
      }

      if content.isTurnComplete {
        // Log the transcripts after the current turn is complete
        print("Input audio: \(inputTranscript)")
        print("Output audio: \(outputTranscript)")

        // Reset the transcripts for the next turn
        inputTranscript = ""
        outputTranscript = ""
      }
    }
  }


} catch {
  // Handle error
}

// ...

Kotlin


// ...

val liveModel = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        inputAudioTranscription = AudioTranscriptionConfig()
        outputAudioTranscription = AudioTranscriptionConfig()
   }
)

val liveSession = liveModel.connect()

fun handleTranscription(input: Transcription?, output: Transcription?) {
    input?.text?.let { text ->
        // Handle transcription text of the audio input
        println("Input Transcription: $text")
    }
    output?.text?.let { text ->
        // Handle transcription text of the audio output
        println("Output Transcription: $text")
    }
}

liveSession.startAudioConversation(null, ::handleTranscription)

// ...

Java


// ...

ExecutorService executor = Executors.newFixedThreadPool(1);

LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    new LiveGenerationConfig.Builder()
            .setResponseModality(ResponseModality.AUDIO)
            .setInputAudioTranscription(new AudioTranscriptionConfig())
            .setOutputAudioTranscription(new AudioTranscriptionConfig())
            .build()
    );

LiveModelFutures liveModel = LiveModelFutures.from(lm);
ListenableFuture sessionFuture = liveModel.connect();

Futures.addCallback(sessionFuture, new FutureCallback() {
    @Override
    public void onSuccess(LiveSessionFutures ses) {
        LiveSessionFutures session = ses;
        session.startAudioConversation((Transcription input, Transcription output) -> {
            if (input != null) {
                // Handle transcription text of the audio input
                System.out.println("Input Transcription: " + input.getText());
            }
            if (output != null) {
                // Handle transcription text of the audio output
                System.out.println("Output Transcription: " + output.getText());
            }
            return null;
        });
    }

    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
        t.printStackTrace();
    }
}, executor);

// ...

Web


// ...

const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

const liveModel = getLiveGenerativeModel(ai, {
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to return transcriptions of the audio input and output
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    inputAudioTranscription: {},
    outputAudioTranscription: {},
  },
});

const liveSession = await liveModel.connect();

liveSession.sendAudioRealtime({ data, mimeType: "audio/pcm" });

const messages = liveSession.receive();
for await (const message of messages) {
  switch (message.type) {
    case 'serverContent':
      if (message.inputTranscription) {
        // Handle transcription text of the audio input
        console.log(`Input transcription: ${message.inputTranscription.text}`);
      }
      if (message.outputTranscription) {
        // Handle transcription text of the audio output
        console.log(`Output transcription: ${message.outputTranscription.text}`);
      } else {
      	 // Handle other message types (modelTurn, turnComplete, interruption)
      }
    default:
      // Handle other message types (toolCall, toolCallCancellation)
  }
}

// ...

Dart


// ...

final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to return transcriptions of the audio input and output
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
    inputAudioTranscription: AudioTranscriptionConfig(),
    outputAudioTranscription: AudioTranscriptionConfig(),
  ),
);

final LiveSession _session = _liveModel.connect();

await for (final response in _session.receive()) {
  LiveServerContent message = response.message;
  if (message.inputTranscription?.text case final inputText?) {
    // Handle transcription text of the audio input
    print('Input: $inputText');
  }

  if (message.outputTranscription?.text case final outputText?) {
    // Handle transcription text of the audio output
    print('Output: $outputText');
  }
}

// ...

Unity


// ...

var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio },
        inputAudioTranscription: new AudioTranscriptionConfig(),
        outputAudioTranscription: new AudioTranscriptionConfig()
    )
);

try
{
    var session = await liveModel.ConnectAsync();
    var stream = session.ReceiveAsync();
    await foreach (var response in stream) {
        if (response.Message is LiveSessionContent sessionContent) {
            if (!string.IsNullOrEmpty(sessionContent.InputTranscription?.Text)) {
              // handle transcription text of input audio
            }

            if (!string.IsNullOrEmpty(sessionContent.OutputTranscription?.Text)) {
              // handle transcription text of output audio
            }
        }
    }
}
catch (Exception e)
{
    // Handle error
}

// ...

Wykrywanie aktywności głosowej (VAD)

Model automatycznie wykrywa aktywność głosową (VAD) w ciągłym strumieniu wejścia audio. VAD jest domyślnie włączone.

Zarządzanie sesją

Więcej informacji o tych tematach związanych z sesjami:

Zaawansowane możliwości, w tym:
- Aktualizowanie instrukcji systemowych w trakcie sesji
- Dodawanie przyrostowych aktualizacji treści
Limity związane z sesjami, w tym limity połączeń i długości sesji, limity okna kontekstu sesji i limity szybkości.
Opcje obsługi limitów sesji, w tym:
- Kompresowanie okna kontekstu
- Wznawianie sesji

Opcje konfiguracji interfejsu Live API Zadbaj o dobrą organizację dzięki kolekcji Zapisuj i kategoryzuj treści zgodnie ze swoimi preferencjami.

Głos i język odpowiedzi

Określanie głosu odpowiedzi

Swift

Kotlin

Java

Web

Dart

Unity

Wpływanie na język odpowiedzi

Transkrypcje wejścia i wyjścia audio

Swift

Kotlin

Java

Web

Dart

Unity

Wykrywanie aktywności głosowej (VAD)

Zarządzanie sesją

Opcje konfiguracji interfejsu Live API