The latest Gemini models, like Gemini 3.1 Flash Image (Nano Banana 2), are available to use with Firebase AI Logic! Learn more.

Gemini 2.0 Flash and Flash-Lite models will shut down on June 1, 2026. To avoid service disruption, update to a newer model like gemini-3.1-flash-lite. Learn more.

All Imagen models will shut down on June 24, 2026. Learn about migrating your apps to use Nano Banana.

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Live API의 구성 옵션

Live API의 기본 구현을 사용하더라도 사용자를 위한 매력적이고 강력한 상호작용을 빌드할 수 있습니다. 다음 구성 옵션을 사용하여 환경을 더욱 맞춤설정할 수도 있습니다.

응답 음성 및 언어
오디오 입력 및 출력의 스크립트
음성 활동 감지 (VAD)
세션 관리

응답 음성 및 언어

모델이 특정 음성으로 응답하도록 하고 모델이 여러 언어로 응답하도록 영향을 줄 수 있습니다.

응답 음성 지정

Gemini API 제공업체를 클릭하여 이 페이지에서 제공업체별 콘텐츠 및 코드를 확인합니다.

Live API은 Chirp 3을 사용하여 HD 음성으로 합성된 음성 응답을 지원합니다.

응답 음성을 지정하지 않으면 기본값은 Puck입니다.

응답 음성 옵션 목록 보기

각 음성의 데모는 Chirp 3: HD 음성을 참조하세요.

Zephyr -- 밝음
Kore -- 단호함
Orus -- 단호함
Autonoe -- 밝음
Umbriel -- 느긋함
Erinome -- 선명함
Laomedeia -- 명랑함
Schedar -- 고름
Achird -- 친근함
Sadachbia -- 활기참 Puck -- 명랑함
Fenrir -- 흥분하기 쉬움
Aoede -- 산뜻함
Enceladus -- 숨소리가 섞임
Algieba -- 매끄러움
Algenib -- 걸걸함
Achernar -- 소프트
Gacrux -- 성숙함
Zubenelgenubi -- 태평함
Sadaltager -- 지식이 풍부함 Charon -- 정보 전달형
Leda -- 젊음
Callirrhoe -- 편안함
Iapetus -- 명료함
Despina -- 매끄러움
Rasalgethi -- 정보 전달형
Alnilam -- 단호함
Pulcherrima -- 선도적임
Vindemiatrix -- 온화함
Sulafat -- 따뜻함

응답 음성을 지정하려면 speechConfig 객체 의 일부로 모델 구성 내에서 음성 이름을 설정합니다.

Swift


// ...

let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to use a specific voice for its audio response
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    speech: SpeechConfig(voiceName: "VOICE_NAME")
  )
)

// ...

Kotlin


// ...

val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        speechConfig = SpeechConfig(voice = Voice("VOICE_NAME"))
    }
)

// ...

Java


// ...

LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    new LiveGenerationConfig.Builder()
        .setResponseModality(ResponseModality.AUDIO)
        .setSpeechConfig(new SpeechConfig(new Voice("VOICE_NAME")))
        .build()
);

// ...

Web


// ...

const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

const liveModel = getLiveGenerativeModel(ai, {
  model: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to use a specific voice for its audio response
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: { voiceName: "VOICE_NAME" },
      },
    },
  },
});

// ...

Dart


// ...

final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to use a specific voice for its audio response
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
    speechConfig: SpeechConfig(voiceName: 'VOICE_NAME'),
  ),
);

// ...

Unity


// ...

var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio },
        speechConfig: SpeechConfig.UsePrebuiltVoice("VOICE_NAME")
    )
);

// ...

응답 언어에 영향 미치기

Live API 모델은 응답에 적합한 언어를 자동으로 선택합니다.

지원되는 언어 목록 보기

언어	BCP-47 코드	언어	BCP-47 코드
아랍어(이집트)	ar-EG	독일어(독일)	de-DE
영어(미국)	en-US	스페인어(미국)	es-US
프랑스어(프랑스)	fr-FR	힌디어(인도)	hi-IN
인도네시아어(인도네시아)	id-ID	이탈리아어(이탈리아)	it-IT
일본어(일본)	ja-JP	한국어(대한민국)	ko-KR
포르투갈어(브라질)	pt-BR	러시아어(러시아)	ru-RU
네덜란드어(네덜란드)	nl-NL	폴란드어(폴란드)	pl-PL
태국어(태국)	th-TH	터키어(터키)	tr-TR
베트남어(베트남)	vi-VN	루마니아어(루마니아)	ro-RO
우크라이나어(우크라이나)	uk-UA	벵골어(방글라데시)	bn-BD
영어(인도)	en-IN 및 hi-IN 번들	마라티어(인도)	mr-IN
타밀어(인도)	ta-IN	텔루구어(인도)	te-IN

모델이 영어 이외의 언어로 응답하거나 항상 특정 언어로 응답하도록 하려면 다음 예와 같은 시스템 안내를 사용하여 모델의 응답에 영향을 줄 수 있습니다.

모델에 영어 이외의 언어가 적절할 수 있음을 강조합니다.

Listen to the speaker carefully. If you detect a non-English language, respond
in the language you hear from the speaker. You must respond unmistakably in the
speaker's language.

모델에 항상 특정 언어로 응답하도록 지시합니다.

RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.

오디오 입력 및 출력의 스크립트

Gemini API 제공업체를 클릭하여 이 페이지에서 제공업체별 콘텐츠 및 코드를 확인합니다.

모델 응답의 일부로 오디오 입력 및 모델의 오디오 응답의 스크립트를 수신할 수 있습니다. 이 구성은 모델 구성의 일부로 설정합니다.

오디오 입력의 스크립트 작성의 경우 inputAudioTranscription을 추가합니다.
모델의 오디오 응답의 스크립트 작성의 경우 outputAudioTranscription을 추가합니다.

다음에 유의하세요.

입력 및 출력의 스크립트를 모두 반환하도록 모델을 구성하거나(다음 예와 같이) 둘 중 하나만 반환하도록 구성할 수 있습니다.
스크립트는 오디오와 함께 스트리밍되므로 각 턴에서 텍스트 부분을 수집하는 것과 마찬가지로 스크립트를 수집하는 것이 좋습니다.
스크립트 작성 언어는 오디오 입력 및 모델의 오디오 응답에서 추론됩니다.

Swift


// ...

let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to return transcriptions of the audio input and output
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    inputAudioTranscription: AudioTranscriptionConfig(),
    outputAudioTranscription: AudioTranscriptionConfig()
  )
)

var inputTranscript: String = ""
var outputTranscript: String = ""

do {
  let session = try await liveModel.connect()
  for try await response in session.responses {
    if case let .content(content) = response.payload {
      if let inputText = content.inputAudioTranscription?.text {
        // Handle transcription text of the audio input
        inputTranscript += inputText
      }

      if let outputText = content.outputAudioTranscription?.text {
        // Handle transcription text of the audio output
        outputTranscript += outputText
      }

      if content.isTurnComplete {
        // Log the transcripts after the current turn is complete
        print("Input audio: \(inputTranscript)")
        print("Output audio: \(outputTranscript)")

        // Reset the transcripts for the next turn
        inputTranscript = ""
        outputTranscript = ""
      }
    }
  }


} catch {
  // Handle error
}

// ...

Kotlin


// ...

val liveModel = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        inputAudioTranscription = AudioTranscriptionConfig()
        outputAudioTranscription = AudioTranscriptionConfig()
   }
)

val liveSession = liveModel.connect()

fun handleTranscription(input: Transcription?, output: Transcription?) {
    input?.text?.let { text ->
        // Handle transcription text of the audio input
        println("Input Transcription: $text")
    }
    output?.text?.let { text ->
        // Handle transcription text of the audio output
        println("Output Transcription: $text")
    }
}

liveSession.startAudioConversation(null, ::handleTranscription)

// ...

Java


// ...

ExecutorService executor = Executors.newFixedThreadPool(1);

LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    new LiveGenerationConfig.Builder()
            .setResponseModality(ResponseModality.AUDIO)
            .setInputAudioTranscription(new AudioTranscriptionConfig())
            .setOutputAudioTranscription(new AudioTranscriptionConfig())
            .build()
    );

LiveModelFutures liveModel = LiveModelFutures.from(lm);
ListenableFuture sessionFuture = liveModel.connect();

Futures.addCallback(sessionFuture, new FutureCallback() {
    @Override
    public void onSuccess(LiveSessionFutures ses) {
        LiveSessionFutures session = ses;
        session.startAudioConversation((Transcription input, Transcription output) -> {
            if (input != null) {
                // Handle transcription text of the audio input
                System.out.println("Input Transcription: " + input.getText());
            }
            if (output != null) {
                // Handle transcription text of the audio output
                System.out.println("Output Transcription: " + output.getText());
            }
            return null;
        });
    }

    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
        t.printStackTrace();
    }
}, executor);

// ...

Web


// ...

const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

const liveModel = getLiveGenerativeModel(ai, {
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to return transcriptions of the audio input and output
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    inputAudioTranscription: {},
    outputAudioTranscription: {},
  },
});

const liveSession = await liveModel.connect();

liveSession.sendAudioRealtime({ data, mimeType: "audio/pcm" });

const messages = liveSession.receive();
for await (const message of messages) {
  switch (message.type) {
    case 'serverContent':
      if (message.inputTranscription) {
        // Handle transcription text of the audio input
        console.log(`Input transcription: ${message.inputTranscription.text}`);
      }
      if (message.outputTranscription) {
        // Handle transcription text of the audio output
        console.log(`Output transcription: ${message.outputTranscription.text}`);
      } else {
      	 // Handle other message types (modelTurn, turnComplete, interruption)
      }
    default:
      // Handle other message types (toolCall, toolCallCancellation)
  }
}

// ...

Dart


// ...

final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to return transcriptions of the audio input and output
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
    inputAudioTranscription: AudioTranscriptionConfig(),
    outputAudioTranscription: AudioTranscriptionConfig(),
  ),
);

final LiveSession _session = _liveModel.connect();

await for (final response in _session.receive()) {
  LiveServerContent message = response.message;
  if (message.inputTranscription?.text case final inputText?) {
    // Handle transcription text of the audio input
    print('Input: $inputText');
  }

  if (message.outputTranscription?.text case final outputText?) {
    // Handle transcription text of the audio output
    print('Output: $outputText');
  }
}

// ...

Unity


// ...

var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio },
        inputAudioTranscription: new AudioTranscriptionConfig(),
        outputAudioTranscription: new AudioTranscriptionConfig()
    )
);

try
{
    var session = await liveModel.ConnectAsync();
    var stream = session.ReceiveAsync();
    await foreach (var response in stream) {
        if (response.Message is LiveSessionContent sessionContent) {
            if (!string.IsNullOrEmpty(sessionContent.InputTranscription?.Text)) {
              // handle transcription text of input audio
            }

            if (!string.IsNullOrEmpty(sessionContent.OutputTranscription?.Text)) {
              // handle transcription text of output audio
            }
        }
    }
}
catch (Exception e)
{
    // Handle error
}

// ...

음성 활동 감지 (VAD)

이 모델은 연속 오디오 입력 스트림에서 음성 활동 감지 (VAD)를 자동으로 실행합니다. VAD는 기본적으로 사용 설정되어 있습니다.

세션 관리

다음 세션 관련 주제를 알아보세요.

다음과 같은 고급 기능
- 세션 중 시스템 안내 업데이트
- 점진적 콘텐츠 업데이트 추가
세션 관련 제한, 연결 및 세션 길이 제한, 세션 컨텍스트 윈도우 제한, 비율 제한을 비롯한 세션 관련 제한
다음과 같은 세션 제한 처리 옵션
- 컨텍스트 윈도우 압축
- 세션 재개

Live API의 구성 옵션 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

응답 음성 및 언어

응답 음성 지정

Swift

Kotlin

Java

Web

Dart

Unity

응답 언어에 영향 미치기

오디오 입력 및 출력의 스크립트

Swift

Kotlin

Java

Web

Dart

Unity

음성 활동 감지 (VAD)

세션 관리

Live API의 구성 옵션