The latest Gemini models, like Gemini 3.5 Flash, are available to use with Firebase AI Logic! Learn more.

Gemini 2.0 Flash and Flash-Lite models were shut down on June 1, 2026. To avoid service disruption, update to a newer model like gemini-3.1-flash-lite. Learn more.

All Imagen models will shut down on June 24, 2026. Learn about migrating your apps to use Nano Banana.

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

خيارات الإعداد لواجهة برمجة التطبيقات Live API

حتى مع التنفيذ الأساسي لـ Live API، يمكنك إنشاء تفاعلات جذابة وفعّالة للمستخدمين. يمكنك تخصيص التجربة بشكل أكبر باستخدام خيارات الإعداد التالية:

صوت الرد ولغته
تحويل الصوت إلى نص للإدخال والإخراج الصوتي
رصد النشاط الصوتي (VAD)
إدارة الجلسات

الصوت واللغة المستخدَمان في الرد

يمكنك أن تطلب من النموذج الرد بصوت محدّد وأن تؤثّر فيه للرد بلغات مختلفة.

تحديد صوت الرد

انقر على مزوّد خدمة Gemini API لعرض المحتوى والرمز الخاصين بالمزوّد في هذه الصفحة.

يستخدم Live API Chirp 3 لتوفير ردود صوتية مركّبة بأصوات عالية الدقة.

إذا لم تحدّد صوتًا للردّ، سيكون الصوت التلقائي هو Puck.

عرض قائمة خيارات الصوت للردود

للحصول على عيّنات من كل صوت، يُرجى الاطّلاع على Chirp 3: أصوات عالية الدقة.

Zephyr -- مشرق
Kore -- حازم
Orus -- حازم
Autonoe -- مشرق
Umbriel -- هادئ
Erinome -- واضح
Laomedeia -- مبهج
Schedar -- متوازن
Achird -- ودود
Sadachbia -- حيوي Puck -- مفعم بالحيوية
Fenrir -- متحمّس
Aoede -- منعش
Enceladus -- هادئ
Algieba -- ناعم
Algenib -- خشن
Achernar -- لطيف
Gacrux -- ناضج
Zubenelgenubi -- عادي
Sadaltager -- ملمّ Charon -- مفيد
Leda -- شبابي
Callirrhoe -- هادئ
Iapetus -- واضح
Despina -- سلس
Rasalgethi -- مفيد
Alnilam -- حازم
Pulcherrima -- مقدام
Vindemiatrix -- لطيف
Sulafat -- دافئ

لتحديد صوت الرد، اضبط اسم الصوت ضمن الكائن speechConfig كجزء من إعداد النموذج.

Swift


// ...

let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to use a specific voice for its audio response
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    speech: SpeechConfig(voiceName: "VOICE_NAME")
  )
)

// ...

Kotlin


// ...

val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        speechConfig = SpeechConfig(voice = Voice("VOICE_NAME"))
    }
)

// ...

Java


// ...

LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    new LiveGenerationConfig.Builder()
        .setResponseModality(ResponseModality.AUDIO)
        .setSpeechConfig(new SpeechConfig(new Voice("VOICE_NAME")))
        .build()
);

// ...

Web


// ...

const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

const liveModel = getLiveGenerativeModel(ai, {
  model: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to use a specific voice for its audio response
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: { voiceName: "VOICE_NAME" },
      },
    },
  },
});

// ...

Dart


// ...

final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to use a specific voice for its audio response
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
    speechConfig: SpeechConfig(voiceName: 'VOICE_NAME'),
  ),
);

// ...

Unity


// ...

var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to use a specific voice for its audio response
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio },
        speechConfig: SpeechConfig.UsePrebuiltVoice("VOICE_NAME")
    )
);

// ...

التأثير في لغة الرد

تختار نماذج Live API تلقائيًا اللغة المناسبة لردودها.

عرض قائمة اللغات المتاحة

اللغة	رمز BCP-47	اللغة	رمز BCP-47
العربية (المصرية)	ar-EG	الألمانية (ألمانيا)	de-DE
الإنجليزية (الولايات المتحدة)	en-US	الإسبانية (الولايات المتحدة)	es-US
الفرنسية (فرنسا)	fr-FR	الهندية (الهند)	hi-IN
الإندونيسية (إندونيسيا)	id-ID	الإيطالية (إيطاليا)	it-IT
اليابانية (اليابان)	ja-JP	الكورية (كوريا)	ko-KR
البرتغالية (البرازيل)	pt-BR	الروسية (روسيا)	ru-RU
الهولندية (هولندا)	nl-NL	البولندية (بولندا)	pl-PL
التايلاندية (تايلاند)	th-TH	التركية (تركيا)	tr-TR
الفيتنامية (فيتنام)	vi-VN	الرومانية (رومانيا)	ro-RO
الأوكرانية (أوكرانيا)	uk-UA	البنغالية‬ (بنغلاديش)	bn-BD
الإنجليزية (الهند)	حزمة en-IN وhi-IN	الماراثية (الهند)	mr-IN
التاميلية‬ (الهند)	ta-IN	التيلوغوية (الهند)	te-IN

إذا أردت أن يردّ النموذج بلغة غير الإنجليزية أو بلغة معيّنة دائمًا، يمكنك التأثير في ردود النموذج باستخدام تعليمات النظام مثل الأمثلة التالية:

تأكيد أنّ استخدام لغة غير الإنجليزية قد يكون مناسبًا

Listen to the speaker carefully. If you detect a non-English language, respond
in the language you hear from the speaker. You must respond unmistakably in the
speaker's language.

إخبار النموذج بالرد دائمًا بلغة معيّنة
```
RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.
```

تحويل الصوت إلى نص للإدخال والإخراج الصوتي

انقر على مزوّد خدمة Gemini API لعرض المحتوى والرمز الخاصين بالمزوّد في هذه الصفحة.

يمكنك تلقّي نصوص مكتوبة للإدخال الصوتي وردّ النموذج الصوتي كجزء من ردّ النموذج. يمكنك ضبط هذا الإعداد كجزء من إعدادات النموذج.

لتحويل الإدخال الصوتي إلى نص، أضِف inputAudioTranscription.
لتحويل الرد الصوتي للنموذج إلى نص، أضِف outputAudioTranscription.

لاحظ ما يلي:

يمكنك ضبط النموذج لعرض نصوص لكل من الإدخال والإخراج (كما هو موضّح في المثال التالي)، أو يمكنك ضبطه لعرض نص أحدهما فقط.
يتم بث النصوص مع الصوت، لذا من الأفضل جمعها بالطريقة نفسها التي تجمع بها أجزاء النص في كل دور.
يتم استنتاج لغة النص من الإدخال الصوتي والرد الصوتي للنموذج.

Swift


// ...

let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to return transcriptions of the audio input and output
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    inputAudioTranscription: AudioTranscriptionConfig(),
    outputAudioTranscription: AudioTranscriptionConfig()
  )
)

var inputTranscript: String = ""
var outputTranscript: String = ""

do {
  let session = try await liveModel.connect()
  for try await response in session.responses {
    if case let .content(content) = response.payload {
      if let inputText = content.inputAudioTranscription?.text {
        // Handle transcription text of the audio input
        inputTranscript += inputText
      }

      if let outputText = content.outputAudioTranscription?.text {
        // Handle transcription text of the audio output
        outputTranscript += outputText
      }

      if content.isTurnComplete {
        // Log the transcripts after the current turn is complete
        print("Input audio: \(inputTranscript)")
        print("Output audio: \(outputTranscript)")

        // Reset the transcripts for the next turn
        inputTranscript = ""
        outputTranscript = ""
      }
    }
  }


} catch {
  // Handle error
}

// ...

Kotlin


// ...

val liveModel = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        inputAudioTranscription = AudioTranscriptionConfig()
        outputAudioTranscription = AudioTranscriptionConfig()
   }
)

val liveSession = liveModel.connect()

fun handleTranscription(input: Transcription?, output: Transcription?) {
    input?.text?.let { text ->
        // Handle transcription text of the audio input
        println("Input Transcription: $text")
    }
    output?.text?.let { text ->
        // Handle transcription text of the audio output
        println("Output Transcription: $text")
    }
}

liveSession.startAudioConversation(null, ::handleTranscription)

// ...

Java


// ...

ExecutorService executor = Executors.newFixedThreadPool(1);

LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    new LiveGenerationConfig.Builder()
            .setResponseModality(ResponseModality.AUDIO)
            .setInputAudioTranscription(new AudioTranscriptionConfig())
            .setOutputAudioTranscription(new AudioTranscriptionConfig())
            .build()
    );

LiveModelFutures liveModel = LiveModelFutures.from(lm);
ListenableFuture sessionFuture = liveModel.connect();

Futures.addCallback(sessionFuture, new FutureCallback() {
    @Override
    public void onSuccess(LiveSessionFutures ses) {
        LiveSessionFutures session = ses;
        session.startAudioConversation((Transcription input, Transcription output) -> {
            if (input != null) {
                // Handle transcription text of the audio input
                System.out.println("Input Transcription: " + input.getText());
            }
            if (output != null) {
                // Handle transcription text of the audio output
                System.out.println("Output Transcription: " + output.getText());
            }
            return null;
        });
    }

    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
        t.printStackTrace();
    }
}, executor);

// ...

Web


// ...

const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

const liveModel = getLiveGenerativeModel(ai, {
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to return transcriptions of the audio input and output
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    inputAudioTranscription: {},
    outputAudioTranscription: {},
  },
});

const liveSession = await liveModel.connect();

liveSession.sendAudioRealtime({ data, mimeType: "audio/pcm" });

const messages = liveSession.receive();
for await (const message of messages) {
  switch (message.type) {
    case 'serverContent':
      if (message.inputTranscription) {
        // Handle transcription text of the audio input
        console.log(`Input transcription: ${message.inputTranscription.text}`);
      }
      if (message.outputTranscription) {
        // Handle transcription text of the audio output
        console.log(`Output transcription: ${message.outputTranscription.text}`);
      } else {
      	 // Handle other message types (modelTurn, turnComplete, interruption)
      }
    default:
      // Handle other message types (toolCall, toolCallCancellation)
  }
}

// ...

Dart


// ...

final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to return transcriptions of the audio input and output
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
    inputAudioTranscription: AudioTranscriptionConfig(),
    outputAudioTranscription: AudioTranscriptionConfig(),
  ),
);

final LiveSession _session = _liveModel.connect();

await for (final response in _session.receive()) {
  LiveServerContent message = response.message;
  if (message.inputTranscription?.text case final inputText?) {
    // Handle transcription text of the audio input
    print('Input: $inputText');
  }

  if (message.outputTranscription?.text case final outputText?) {
    // Handle transcription text of the audio output
    print('Output: $outputText');
  }
}

// ...

Unity


// ...

var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to return transcriptions of the audio input and output
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio },
        inputAudioTranscription: new AudioTranscriptionConfig(),
        outputAudioTranscription: new AudioTranscriptionConfig()
    )
);

try
{
    var session = await liveModel.ConnectAsync();
    var stream = session.ReceiveAsync();
    await foreach (var response in stream) {
        if (response.Message is LiveSessionContent sessionContent) {
            if (!string.IsNullOrEmpty(sessionContent.InputTranscription?.Text)) {
              // handle transcription text of input audio
            }

            if (!string.IsNullOrEmpty(sessionContent.OutputTranscription?.Text)) {
              // handle transcription text of output audio
            }
        }
    }
}
catch (Exception e)
{
    // Handle error
}

// ...

رصد النشاط الصوتي (VAD)

يُجري النموذج تلقائيًا عملية رصد النشاط الصوتي (VAD) على بث مستمر من إدخال الصوت. يتم تفعيل ميزة "التحقّق من صحة العنوان" تلقائيًا.

إدارة الجلسة

يمكنك الاطّلاع على المواضيع التالية ذات الصلة بالجلسات:

إمكانات متقدّمة، بما في ذلك:
- تعديل تعليمات النظام أثناء الجلسة
- إضافة تعديلات تدريجية على المحتوى
الحدود القصوى المرتبطة بالجلسة، بما في ذلك الحدود القصوى لطول الجلسة والاتصال، والحدود القصوى لقدرة الاستيعاب في الجلسة، والحدود القصوى لمعدّل الاستخدام
خيارات التعامل مع حدود الجلسات، بما في ذلك:
- ضغط قدرة الاستيعاب
- استئناف جلسة

خيارات الإعداد لواجهة برمجة التطبيقات Live API تنظيم صفحاتك في مجموعات يمكنك حفظ المحتوى وتصنيفه حسب إعداداتك المفضّلة.

الصوت واللغة المستخدَمان في الرد

تحديد صوت الرد

Swift

Kotlin

Java

Web

Dart

Unity

التأثير في لغة الرد

تحويل الصوت إلى نص للإدخال والإخراج الصوتي

Swift

Kotlin

Java

Web

Dart

Unity

رصد النشاط الصوتي (VAD)

إدارة الجلسة

خيارات الإعداد لواجهة برمجة التطبيقات Live API