The latest Gemini models, like Gemini 3.5 Flash, are available to use with Firebase AI Logic! Learn more.

Gemini 2.0 Flash and Flash-Lite models were shut down on June 1, 2026. To avoid service disruption, update to a newer model like gemini-3.1-flash-lite. Learn more.

All Imagen models will shut down on June 24, 2026. Learn about migrating your apps to use Nano Banana.

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

管理 Live API 的会话

Gemini Live API 会处理称为会话的连续音频或文本流。您可以管理会话生命周期，从初始握手到正常终止。

会话限制

对于 Live API，会话是指通过连接持续流式传输输入和输出的持久连接。

如果会话超出以下任何限制，连接将会终止。不过请注意，Live API 提供了一些选项（见下文）来处理这些与会话相关的限制。

会话上下文窗口 限制为 12.8 万个 token。

由于此上下文窗口限制，以下是根据输入模态估算出的会话时长上限：
- 仅音频输入会话的时长上限为 15 分钟。
- 视频 + 音频输入会话的时长上限为 2 分钟。
连接时长 限制为大约 10 分钟。

您会在连接结束前大约 60 秒收到 即将结束通知。

以下是一些用于处理与会话相关的限制的选项：

压缩会话上下文窗口以便服务器自动将上下文大小保持在限制范围内。
恢复会话以防止在网络短暂断开连接期间或收到 即将结束 通知后丢失对话上下文。

启动会话

如需查看展示如何启动会话的完整代码段，请参阅快速入门指南Live API 。

在会话期间更新

Live API 模型支持以下高级功能，用于 在会话期间更新：

添加增量内容更新
更新系统指令 （仅适用于 Vertex AI Gemini API）

添加增量内容更新

您可以在活跃会话期间添加增量更新。使用此功能可发送文本输入、建立会话上下文或恢复会话上下文。

对于较长的上下文，建议提供单个消息摘要，以释放上下文窗口，以便进行后续互动。
对于简短的上下文，您可以发送精细（导航）互动来表示确切的事件序列，如下面的代码段所示。

Swift

// Define initial turns (history/context).
let turns: [ModelContent] = [
  ModelContent(role: "user", parts: [TextPart("What is the capital of France?")]),
  ModelContent(role: "model", parts: [TextPart("Paris")]),
]

// Send history, keeping the conversational turn OPEN (false).
await session.sendContent(turns, turnComplete: false)

// Define the new user query.
let newTurn: [ModelContent] = [
  ModelContent(role: "user", parts: [TextPart("What is the capital of Germany?")]),
]

// Send the final query, CLOSING the turn (true) to trigger the model response.
await session.sendContent(newTurn, turnComplete: true)

Kotlin

Not yet supported for Android apps - check back soon!

Java

Not yet supported for Android apps - check back soon!

Web

const turns = [{ text: "Hello from the user!" }];

await session.send(
  turns,
  false // turnComplete: false
);

console.log("Sent history. Waiting for next input...");

// Define the new user query.
const newTurn [{ text: "And what is the capital of Germany?" }];

// Send the final query, CLOSING the turn (true) to trigger the model response.
await session.send(
    newTurn,
    true // turnComplete: true
);
console.log("Sent final query. Model response expected now.");

Dart

// Define initial turns (history/context).
final List turns = [
  Content(
    "user",
    [Part.text("What is the capital of France?")],
  ),
  Content(
    "model",
    [Part.text("Paris")],
  ),
];

// Send history, keeping the conversational turn OPEN (false).
await session.send(
  input: turns,
  turnComplete: false,
);

// Define the new user query.
final List newTurn = [
  Content(
    "user",
    [Part.text("What is the capital of Germany?")],
  ),
];

// Send the final query, CLOSING the turn (true) to trigger the model response.
await session.send(
  input: newTurn,
  turnComplete: true,
);

Unity

// Define initial turns (history/context).
List turns = new List {
    new ModelContent("user", new ModelContent.TextPart("What is the capital of France?") ),
    new ModelContent("model", new ModelContent.TextPart("Paris") ),
};

// Send history, keeping the conversational turn OPEN (false).
foreach (ModelContent turn in turns)
{
    await session.SendAsync(
        content: turn,
        turnComplete: false
    );
}

// Define the new user query.
ModelContent newTurn = ModelContent.Text("What is the capital of Germany?");

// Send the final query, CLOSING the turn (true) to trigger the model response.
await session.SendAsync(
    content: newTurn,
    turnComplete: true
);

在会话期间更新系统指令

仅在将 Vertex AI Gemini API 用作 API 提供商时可用。

您可以在活跃会话期间更新系统指令。使用此功能可调整模型的回答，例如更改回答语言或修改语气。

如需在会话期间更新系统指令，您可以使用 system 角色发送文本内容。更新后的系统指令将在剩余会话期间保持有效。

Swift

await session.sendContent(
  [ModelContent(
    role: "system",
    parts: [TextPart("new system instruction")]
  )],
  turnComplete: false
)

Kotlin

Not yet supported for Android apps - check back soon!

Java

Not yet supported for Android apps - check back soon!

Web

Not yet supported for Web apps - check back soon!

Dart

try {
  await _session.send(
    input: Content(
      'system',
      [Part.text('new system instruction')],
    ),
    turnComplete: false,
  );
} catch (e) {
  print('Failed to update system instructions: $e');
}

Unity

try
{
    await session.SendAsync(
        content: new ModelContent(
            "system",
            new ModelContent.TextPart("new system instruction")
        ),
        turnComplete: false
    );
}
catch (Exception e)
{
    Debug.LogError($"Failed to update system instructions: {e.Message}");
}

压缩上下文窗口

点击您的 Gemini API 提供商，以查看此页面上特定于提供商的内容和代码。

Live API 会话上下文窗口用于存储实时流式数据（对于音频为每秒 25 个 token [TPS]，对于视频为每秒 258 个 token）以及其他内容，包括文本输入和模型输出。所有 Live API 模型的会话上下文窗口限制均为 12.8 万个 token。

默认情况下，由于此上下文窗口限制，以下是根据输入模态估算出的会话时长上限：

仅音频输入会话的时长上限为 15 分钟。
视频 + 音频输入会话的时长上限为 2 分钟。

在长时间运行的会话中，随着对话的进行，音频和/或视频 token 的历史记录会不断累积。如果此历史记录超出模型的限制，模型可能会产生幻觉、运行速度变慢，或者会话可能会被强制终止。

如需启用更长的会话，您可以启用 上下文窗口压缩 ，方法是设置 LiveGenerationConfig 的 contextWindowCompression 字段。启用后，服务器会使用 滑动窗口 机制自动舍弃最旧的轮次或对其进行总结，以将上下文大小保持在默认或指定的限制范围内。系统指令不会被舍弃，并且始终位于上下文窗口的开头。

从用户的角度来看，由于“内存”会不断得到管理，因此理论上会话时长是无限的。

您可以配置滑动窗口机制，还可以 选择性地 配置触发压缩的 token 数（请参阅下面的可用设置和值）。以下是有关使用这些设置的一些高级注意事项：

将 targetTokens 设置为非常低的值会为连续流释放更多上下文空间，但模型会迅速“忘记”对话中较旧的轮次。
将 targetTokens 设置为更接近 triggerTokens 的值会保留更多内存，但会更频繁地触发压缩例程。

设置	如果未在配置中设置，则滑动窗口的默认值	最小值	最大值
`triggerTokens` 触发压缩之前的上下文长度	模型上下文窗口限制的 80%	5,000	128,000
`targetTokens` 要保留的目标 token 数	`triggerTokens` 值的 50% 如果 `triggerTokens` 未明确设置，则 `targetTokens` 默认为默认 `triggerTokens` 值的 50%。 `targetTokens` 值必须小于 `triggerTokens` 值。	0	128,000

Swift


// Initialize the Gemini Developer API backend service
let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Enable context window compression.
  // (Optional) Configure the number of tokens in the context window that triggers the compression.
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    contextWindowCompression: ContextWindowCompressionConfig(
      triggerTokens: 10000,
      slidingWindow: SlidingWindow(
        targetTokens: 2000,
      )
    )
  )
)

Kotlin


// Initialize the Gemini Developer API backend service
val liveModel = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
    // Enable context window compression.
    // (Optional) Configure the number of tokens in the context window that triggers the compression.
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO,
        contextWindowCompression = ContextWindowCompressionConfig(
            triggerTokens = 10000,
            slidingWindow = SlidingWindow(targetTokens = 2000)
        )
    }
)

Java


// Initialize the Gemini Developer API backend service
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.5-flash-native-audio-preview-12-2025",
        // Enable context window compression.
        // (Optional) Configure the number of tokens in the context window that triggers the compression.
        new LiveGenerationConfig.Builder()
                .setResponseModality(ResponseModality.AUDIO)
                .setContextWindowCompression(
                        new ContextWindowCompressionConfig(10000, new SlidingWindow(2000))
                )
                .build()
);

Web


const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

const liveModel = getLiveGenerativeModel(ai, {
  model: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Enable context window compression.
  // (Optional) Configure the number of tokens in the context window that triggers the compression.
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    contextWindowCompression: {
      triggerTokens: 10000,
      slidingWindow: {
        targetTokens: 2000,
      },
    },
  },
});

Dart


final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Enable context window compression.
  // (Optional) Configure the number of tokens in the context window that triggers the compression.
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
    contextWindowCompression: ContextWindowCompressionConfig(
      triggerTokens: 10000,
      slidingWindow: SlidingWindow(targetTokens: 2000),
    ),
  ),
);

Unity


var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
    // Enable context window compression.
    // (Optional) Configure the number of tokens in the context window that triggers the compression.
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio },
        contextWindowCompression: new ContextWindowCompressionConfig(
            triggerTokens: 10000,
            slidingWindow: new SlidingWindow(targetTokens: 2000)
        )
    )
);

检测会话何时即将结束

单个连续 WebSocket 连接的时长上限约为 10 分钟。系统会在连接结束前60 秒向客户端发送即将结束通知，这有助于您采取进一步的操作（例如，恢复会话）。

以下示例展示了如何通过监听 即将结束 通知来检测连接即将终止：

Swift

for try await response in session.responses {
  switch response.payload {

  case .goingAwayNotice(let goingAwayNotice):
    // Prepare for the session to close soon
    if let timeLeft = goingAwayNotice.timeLeft {
        print("Server going away in \(timeLeft) seconds")
    }
  }
}

Kotlin

for (response in session.responses) {
    when (val message = response.payload) {
        is LiveServerGoAway -> {
            // Prepare for the session to close soon
            val remaining = message.timeLeft
            logger.info("Server going away in $remaining")
        }
    }
}

Java

session.getResponses().forEach(response -> {
    if (response.getPayload() instanceof LiveServerResponse.GoingAwayNotice) {
        LiveServerResponse.GoingAwayNotice notice = (LiveServerResponse.GoingAwayNotice) response.getPayload();
        // Prepare for the session to close soon
        Duration timeLeft = notice.getTimeLeft();
    }
});

Web

for await (const message of session.receive()) {
  switch (message.type) {

  ...
  case "goingAwayNotice":
    console.log("Server going away. Time left:", message.timeLeft);
    break;
  }
}

Dart

Future _handleLiveServerMessage(LiveServerResponse response) async {
  final message = response.message;
  if (message is GoingAwayNotice) {
     // Prepare for the session to close soon
     developer.log('Server going away. Time left: ${message.timeLeft}');
  }
}

Unity

foreach (var response in session.Responses) {
    if (response.Payload is LiveSessionGoingAway notice) {
        // Prepare for the session to close soon
        TimeSpan timeLeft = notice.TimeLeft;
        Debug.Log($"Server going away notice received. Remaining: {timeLeft}");
    }
}

恢复会话

Live API 支持会话恢复，以防止丢失对话上下文。每个会话都有一个句柄，您可以通过以下方式使用它：

在达到连接时长限制之前维护会话

单个连续 WebSocket 连接的时长上限约为 10 分钟。您可以通过监听 即将结束 通知来检测连接何时即将结束，然后使用会话句柄建立新连接来延长会话。
在连接断开后立即恢复会话

如果连接在达到连接时长上限之前终止或断开（例如，从 WLAN 切换到 5G），服务器会将会话状态保留大约 10 分钟。在此期间，您可以使用会话句柄建立新连接来恢复会话。
在较长时间后恢复会话

连接结束后，服务器会将会话状态保留几个小时。在此期间，您可以使用会话句柄建立新连接来恢复会话。请注意，对于两个 Gemini API 提供商，此时间段是不同的： Gemini Developer API 为 2 小时， Vertex AI Gemini API 为 24 小时。

默认情况下，会话恢复功能处于停用状态。如需启用会话恢复功能，请在建立新连接时传递空恢复配置。启用后，服务器会定期发送包含会话恢复句柄的更新。如果会话断开连接，您可以重新连接并传递此句柄，以恢复会话并保留其上下文。

以下示例展示了恢复会话的两种选项：

Swift

// Local variable to save the active session handle
var activeSessionHandle: String?

// Initialize the session. Passing an empty config requests the server to send SessionResumptionUpdate
var session = try await liveModel.connect(
  sessionResumption: SessionResumptionConfig()
)

// Start receiving responses
for try await message in session.responses {
  // Check for new session handles inside your message handling loop
  switch message.payload {
  case let .sessionResumptionUpdate(updateMessage):
    guard let newHandle = updateMessage.newHandle, updateMessage.resumable else {
      continue
    }
    activeSessionHandle = newHandle
    print("SessionResumptionUpdate: handle \(newHandle)")
  // ... handle other LiveServerMessage types ...
  default:
    break
  }
}

// The following are alternative options to resume a session. Choose only one.

// Option 1: Create and connect a session to resume with the saved handle
if let handle = activeSessionHandle {
  session = try await liveModel.connect(
    sessionResumption: SessionResumptionConfig(handle: handle)
  )
}

// Option 2: Resume the session directly on an existing session object
if let handle = activeSessionHandle {
  try await session.resumeSession(
    sessionResumption: SessionResumptionConfig(handle: handle)
  )
}

Kotlin

// Local variable to save the active session handle
var activeSessionHandle: String? = null

// Initialize the session. Passing an empty config requests the server to send SessionResumptionUpdate
var session = liveModel.connect(
    sessionResumption = SessionResumptionConfig()
)

// Start receiving responses
session.receive().collect { message ->
    // Process other received response types...

    // Check for new session handles inside your message handling loop
    if (message is LiveSessionResumptionUpdate) {
        if (message.resumable == true && message.newHandle != null) {
            activeSessionHandle = message.newHandle
            Log.d("TAG", "SessionResumptionUpdate: handle ${message.newHandle}")
        }
    }
}

// The following are alternative options to resume a session. Choose only one.

// Option 1: Create and connect a session to resume with the saved handle
activeSessionHandle?.let { handle ->
    session = liveModel.connect(
        sessionResumption = SessionResumptionConfig(handle = handle)
    )
}

// Option 2: Resume the session directly on an existing session object
activeSessionHandle?.let { handle ->
    session.resumeSession(
        sessionResumption = SessionResumptionConfig(handle = handle)
    )
}

Java

For Java, session resumption is not yet supported. Check back soon!

Web

// Local variable to save the active session handle
let activeSessionHandle = null;

// Initialize the session. Passing an empty object requests the server to send SessionResumptionUpdate
let session = await liveModel.connect({});

// Start receiving responses
for await (const message of session.receive()) {
  // Process other received response types...

  // Check for new session handles inside your message handling loop
  if (message.type === 'sessionResumptionUpdate') {
    if (message.resumable && message.newHandle) {
      activeSessionHandle = message.newHandle;
      console.log(`SessionResumptionUpdate: handle ${activeSessionHandle}`);
    }
  }
}

// The following are alternative options to resume a session. Choose only one.

// Option 1: Create and connect a session to resume with the saved handle
if (activeSessionHandle) {
  session = await liveModel.connect({
    handle: activeSessionHandle
  });
}

// Option 2: Resume the session directly on an existing session object
if (activeSessionHandle) {
  await session.resumeSession({
    handle: activeSessionHandle
  });
}

Dart

// Local variable to save the active session handle
String? _activeSessionHandle;

// Initialize the session. Passing an empty config requests the server to send SessionResumptionUpdate
var _session = await _liveModel.connect(
  sessionResumption: SessionResumptionConfig(),
);

// Start receiving responses
await for (final message in _session.receive()) {
  // Process other received response types...

  // Check for new session handles inside your message handling loop
  if (message is SessionResumptionUpdate &&
      message.resumable != null &&
      message.resumable!) {
    _activeSessionHandle = message.newHandle;
    log('SessionResumptionUpdate: handle ${message.newHandle}');
  }
}

// The following are alternative options to resume a session. Choose only one.

// Option 1: Create and connect a session to resume with the saved handle
if (_activeSessionHandle != null) {
  _session = await _liveModel.connect(
    sessionResumption: SessionResumptionConfig.resume(_activeSessionHandle!),
  );
}

// Option 2: Alternatively, resume the session directly on an existing session object
if (_activeSessionHandle != null) {
  await _session.resumeSession(
    sessionResumption: SessionResumptionConfig.resume(_activeSessionHandle!),
  );
}

Unity

// Local variable to save the active session handle
string activeSessionHandle = null;

// Initialize the session. Passing an empty config requests the server to send SessionResumptionUpdate
var session = await liveModel.ConnectAsync(
    sessionResumption: new SessionResumptionConfig()
);

// Start receiving responses
await foreach (var response in session.ReceiveAsync())
{
  // Process other received response types...

  // Check for new session handles inside your message handling loop
  if (response.Message is LiveSessionResumptionUpdate updateMessage)
  {
    if (updateMessage.Resumable == true && !string.IsNullOrEmpty(updateMessage.NewHandle))
    {
      activeSessionHandle = updateMessage.NewHandle;
      Debug.Log($"SessionResumptionUpdate: handle {activeSessionHandle}");
    }
  }
}

// The following are alternative options to resume a session. Choose only one.

// Option 1: Create and connect a session to resume with the saved handle
if (!string.IsNullOrEmpty(activeSessionHandle)) {
  session = await liveModel.ConnectAsync(
      sessionResumption: new SessionResumptionConfig(activeSessionHandle)
  );
}

// Option 2: Resume the session directly on an existing session object
if (!string.IsNullOrEmpty(activeSessionHandle)) {
  await session.ResumeSessionAsync(
      sessionResumption: new SessionResumptionConfig(activeSessionHandle)
  );
}

管理 Live API 的会话 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

会话限制

启动会话

在会话期间更新

添加增量内容更新

Swift

Kotlin

Java

Web

Dart

Unity

在会话期间更新系统指令

Swift

Kotlin

Java

Web

Dart

Unity

压缩上下文窗口

Swift

Kotlin

Java

Web

Dart

Unity

检测会话何时即将结束

Swift

Kotlin

Java

Web

Dart

Unity

恢复会话

Swift

Kotlin

Java

Web

Dart

Unity

管理 Live API 的会话