管理 Live API 的会话

Gemini Live API 会处理称为 会话的连续音频或文本流。您可以管理会话生命周期,从初始握手到正常终止。

会话限制

对于 Live API会话 是指通过连接持续流式传输输入 和输出的持久连接。

如果会话超出以下 任何 限制,连接将会终止。不过请注意,Live API 提供了一些选项(见下文)来 处理这些与会话相关的限制。

  • 会话上下文窗口 限制为 12.8 万个 token。

    由于此上下文窗口限制,以下是根据输入模态估算出的会话时长上限:

    • 仅音频输入会话的时长上限为 15 分钟
    • 视频 + 音频输入会话的时长上限为 2 分钟
  • 连接时长 限制为大约 10 分钟

    您会在连接结束前大约 60 秒收到 即将结束通知

以下是一些用于处理与会话相关的限制的选项:

  • 压缩会话上下文窗口 以便服务器自动将上下文大小保持在限制范围内。

  • 恢复会话 以防止在网络短暂断开连接期间或 收到 即将结束 通知后丢失对话上下文。

启动会话

如需查看展示如何启动会话的完整代码段,请参阅 快速入门指南Live API

在会话期间更新

Live API 模型支持以下高级功能,用于 在会话期间更新

添加增量内容更新

您可以在活跃会话期间添加增量更新。使用此功能可发送文本输入、建立会话上下文或恢复会话上下文。

  • 对于较长的上下文,建议提供单个消息摘要,以释放上下文窗口,以便进行后续互动。

  • 对于简短的上下文,您可以发送精细(导航)互动来表示确切的事件序列,如下面的代码段所示。

Swift

// Define initial turns (history/context).
let turns: [ModelContent] = [
  ModelContent(role: "user", parts: [TextPart("What is the capital of France?")]),
  ModelContent(role: "model", parts: [TextPart("Paris")]),
]

// Send history, keeping the conversational turn OPEN (false).
await session.sendContent(turns, turnComplete: false)

// Define the new user query.
let newTurn: [ModelContent] = [
  ModelContent(role: "user", parts: [TextPart("What is the capital of Germany?")]),
]

// Send the final query, CLOSING the turn (true) to trigger the model response.
await session.sendContent(newTurn, turnComplete: true)

Kotlin

Not yet supported for Android apps - check back soon!

Java

Not yet supported for Android apps - check back soon!

Web

const turns = [{ text: "Hello from the user!" }];

await session.send(
  turns,
  false // turnComplete: false
);

console.log("Sent history. Waiting for next input...");

// Define the new user query.
const newTurn [{ text: "And what is the capital of Germany?" }];

// Send the final query, CLOSING the turn (true) to trigger the model response.
await session.send(
    newTurn,
    true // turnComplete: true
);
console.log("Sent final query. Model response expected now.");

Dart

// Define initial turns (history/context).
final List turns = [
  Content(
    "user",
    [Part.text("What is the capital of France?")],
  ),
  Content(
    "model",
    [Part.text("Paris")],
  ),
];

// Send history, keeping the conversational turn OPEN (false).
await session.send(
  input: turns,
  turnComplete: false,
);

// Define the new user query.
final List newTurn = [
  Content(
    "user",
    [Part.text("What is the capital of Germany?")],
  ),
];

// Send the final query, CLOSING the turn (true) to trigger the model response.
await session.send(
  input: newTurn,
  turnComplete: true,
);

Unity

// Define initial turns (history/context).
List turns = new List {
    new ModelContent("user", new ModelContent.TextPart("What is the capital of France?") ),
    new ModelContent("model", new ModelContent.TextPart("Paris") ),
};

// Send history, keeping the conversational turn OPEN (false).
foreach (ModelContent turn in turns)
{
    await session.SendAsync(
        content: turn,
        turnComplete: false
    );
}

// Define the new user query.
ModelContent newTurn = ModelContent.Text("What is the capital of Germany?");

// Send the final query, CLOSING the turn (true) to trigger the model response.
await session.SendAsync(
    content: newTurn,
    turnComplete: true
);

在会话期间更新系统指令

仅在将 Vertex AI Gemini API 用作 API 提供商时可用。

您可以在活跃会话期间更新系统指令。使用此功能可调整模型的回答,例如更改回答语言或修改语气。

如需在会话期间更新系统指令,您可以使用 system 角色发送文本内容。更新后的系统指令将在剩余会话期间保持有效。

Swift

await session.sendContent(
  [ModelContent(
    role: "system",
    parts: [TextPart("new system instruction")]
  )],
  turnComplete: false
)

Kotlin

Not yet supported for Android apps - check back soon!

Java

Not yet supported for Android apps - check back soon!

Web

Not yet supported for Web apps - check back soon!

Dart

try {
  await _session.send(
    input: Content(
      'system',
      [Part.text('new system instruction')],
    ),
    turnComplete: false,
  );
} catch (e) {
  print('Failed to update system instructions: $e');
}

Unity

try
{
    await session.SendAsync(
        content: new ModelContent(
            "system",
            new ModelContent.TextPart("new system instruction")
        ),
        turnComplete: false
    );
}
catch (Exception e)
{
    Debug.LogError($"Failed to update system instructions: {e.Message}");
}

压缩上下文窗口

点击您的 Gemini API 提供商,以查看此页面上特定于提供商的内容 和代码。

Live API 会话上下文窗口用于存储实时流式数据 (对于音频为每秒 25 个 token [TPS],对于视频为每秒 258 个 token)以及其他 内容,包括文本输入和模型输出。所有 Live API 模型的 会话上下文窗口限制均为 12.8 万个 token。

默认情况下,由于此上下文窗口限制,以下是根据输入模态估算出的会话时长上限:

  • 仅音频输入会话的时长上限为 15 分钟
  • 视频 + 音频输入会话的时长上限为 2 分钟

在长时间运行的会话中,随着对话的进行,音频和/或视频 token 的历史记录会不断累积。如果此历史记录超出模型的限制,模型可能会产生幻觉、运行速度变慢,或者会话可能会被强制终止。

如需启用更长的会话,您可以启用 上下文窗口压缩 ,方法是设置 LiveGenerationConfigcontextWindowCompression 字段。启用后,服务器会使用 滑动窗口 机制自动舍弃最旧的轮次或对其进行总结,以将上下文大小保持在默认或指定的限制范围内。系统指令不会被舍弃,并且始终位于上下文窗口的开头。

从用户的角度来看,由于“内存”会不断得到管理,因此理论上会话时长是无限的。

您可以配置滑动窗口机制,还可以 选择性地 配置触发压缩的 token 数(请参阅下面的可用设置和值)。以下是有关使用这些设置的一些高级注意事项:

  • targetTokens 设置为非常低的值会为连续流释放更多上下文空间,但模型会迅速“忘记”对话中较旧的轮次。

  • targetTokens 设置为更接近 triggerTokens 的值会保留更多内存,但会更频繁地触发压缩例程。

设置 如果未在配置中设置,则滑动窗口的默认值 最小值 最大值
triggerTokens
触发压缩之前的上下文长度
模型上下文窗口限制的 80% 5,000 128,000
targetTokens
要保留的目标 token 数
triggerTokens 值的 50%
  • 如果 triggerTokens明确 设置,则 targetTokens 默认为 默认 triggerTokens 值的 50%。
  • targetTokens 值必须小于 triggerTokens 值。
0 128,000

Swift


// Initialize the Gemini Developer API backend service
let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Enable context window compression.
  // (Optional) Configure the number of tokens in the context window that triggers the compression.
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    contextWindowCompression: ContextWindowCompressionConfig(
      triggerTokens: 10000,
      slidingWindow: SlidingWindow(
        targetTokens: 2000,
      )
    )
  )
)

Kotlin


// Initialize the Gemini Developer API backend service
val liveModel = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
    // Enable context window compression.
    // (Optional) Configure the number of tokens in the context window that triggers the compression.
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO,
        contextWindowCompression = ContextWindowCompressionConfig(
            triggerTokens = 10000,
            slidingWindow = SlidingWindow(targetTokens = 2000)
        )
    }
)

Java


// Initialize the Gemini Developer API backend service
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.5-flash-native-audio-preview-12-2025",
        // Enable context window compression.
        // (Optional) Configure the number of tokens in the context window that triggers the compression.
        new LiveGenerationConfig.Builder()
                .setResponseModality(ResponseModality.AUDIO)
                .setContextWindowCompression(
                        new ContextWindowCompressionConfig(10000, new SlidingWindow(2000))
                )
                .build()
);

Web


const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

const liveModel = getLiveGenerativeModel(ai, {
  model: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Enable context window compression.
  // (Optional) Configure the number of tokens in the context window that triggers the compression.
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    contextWindowCompression: {
      triggerTokens: 10000,
      slidingWindow: {
        targetTokens: 2000,
      },
    },
  },
});

Dart


final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Enable context window compression.
  // (Optional) Configure the number of tokens in the context window that triggers the compression.
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
    contextWindowCompression: ContextWindowCompressionConfig(
      triggerTokens: 10000,
      slidingWindow: SlidingWindow(targetTokens: 2000),
    ),
  ),
);

Unity


var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
    // Enable context window compression.
    // (Optional) Configure the number of tokens in the context window that triggers the compression.
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio },
        contextWindowCompression: new ContextWindowCompressionConfig(
            triggerTokens: 10000,
            slidingWindow: new SlidingWindow(targetTokens: 2000)
        )
    )
);

检测会话何时即将结束

单个连续 WebSocket 连接的时长上限约为 10 分钟系统会在连接结束前60 秒向客户端发送即将结束通知,这有助于您采取 进一步的操作(例如, 恢复会话)。

以下示例展示了如何通过监听 即将结束 通知来检测连接即将终止:

Swift

for try await response in session.responses {
  switch response.payload {

  case .goingAwayNotice(let goingAwayNotice):
    // Prepare for the session to close soon
    if let timeLeft = goingAwayNotice.timeLeft {
        print("Server going away in \(timeLeft) seconds")
    }
  }
}

Kotlin

for (response in session.responses) {
    when (val message = response.payload) {
        is LiveServerGoAway -> {
            // Prepare for the session to close soon
            val remaining = message.timeLeft
            logger.info("Server going away in $remaining")
        }
    }
}

Java

session.getResponses().forEach(response -> {
    if (response.getPayload() instanceof LiveServerResponse.GoingAwayNotice) {
        LiveServerResponse.GoingAwayNotice notice = (LiveServerResponse.GoingAwayNotice) response.getPayload();
        // Prepare for the session to close soon
        Duration timeLeft = notice.getTimeLeft();
    }
});

Web

for await (const message of session.receive()) {
  switch (message.type) {

  ...
  case "goingAwayNotice":
    console.log("Server going away. Time left:", message.timeLeft);
    break;
  }
}

Dart

Future _handleLiveServerMessage(LiveServerResponse response) async {
  final message = response.message;
  if (message is GoingAwayNotice) {
     // Prepare for the session to close soon
     developer.log('Server going away. Time left: ${message.timeLeft}');
  }
}

Unity

foreach (var response in session.Responses) {
    if (response.Payload is LiveSessionGoingAway notice) {
        // Prepare for the session to close soon
        TimeSpan timeLeft = notice.TimeLeft;
        Debug.Log($"Server going away notice received. Remaining: {timeLeft}");
    }
}

恢复会话

Live API 支持会话恢复,以防止丢失对话 上下文。每个会话都有一个句柄,您可以通过以下方式使用它:

  • 在达到连接时长限制之前维护会话

    单个连续 WebSocket 连接的时长上限约为 10 分钟。您可以通过 监听 即将结束 通知来检测连接何时即将结束,然后 使用会话 句柄建立新连接来延长会话。

  • 在连接断开后立即恢复会话

    如果连接在达到连接时长上限之前终止或断开 (例如,从 WLAN 切换到 5G),服务器会将会话状态保留 大约 10 分钟。在此期间,您可以使用会话句柄建立新连接来恢复会话。

  • 在较长时间后恢复会话

    连接结束后,服务器会将会话状态保留几个小时。 在此期间,您可以使用会话句柄建立新连接来恢复会话。请注意,对于 两个 Gemini API 提供商,此时间段是不同的: Gemini Developer API2 小时Vertex AI Gemini API24 小时

默认情况下,会话恢复功能处于停用状态。如需启用会话恢复功能,请在建立新连接时传递空恢复配置。启用后,服务器会定期发送包含会话恢复句柄的更新。如果会话断开连接,您可以重新连接并传递此句柄,以恢复会话并保留其上下文。

以下示例展示了恢复会话的两种选项:

Swift

// Local variable to save the active session handle
var activeSessionHandle: String?

// Initialize the session. Passing an empty config requests the server to send SessionResumptionUpdate
var session = try await liveModel.connect(
  sessionResumption: SessionResumptionConfig()
)

// Start receiving responses
for try await message in session.responses {
  // Check for new session handles inside your message handling loop
  switch message.payload {
  case let .sessionResumptionUpdate(updateMessage):
    guard let newHandle = updateMessage.newHandle, updateMessage.resumable else {
      continue
    }
    activeSessionHandle = newHandle
    print("SessionResumptionUpdate: handle \(newHandle)")
  // ... handle other LiveServerMessage types ...
  default:
    break
  }
}

// The following are alternative options to resume a session. Choose only one.

// Option 1: Create and connect a session to resume with the saved handle
if let handle = activeSessionHandle {
  session = try await liveModel.connect(
    sessionResumption: SessionResumptionConfig(handle: handle)
  )
}

// Option 2: Resume the session directly on an existing session object
if let handle = activeSessionHandle {
  try await session.resumeSession(
    sessionResumption: SessionResumptionConfig(handle: handle)
  )
}

Kotlin

// Local variable to save the active session handle
var activeSessionHandle: String? = null

// Initialize the session. Passing an empty config requests the server to send SessionResumptionUpdate
var session = liveModel.connect(
    sessionResumption = SessionResumptionConfig()
)

// Start receiving responses
session.receive().collect { message ->
    // Process other received response types...

    // Check for new session handles inside your message handling loop
    if (message is LiveSessionResumptionUpdate) {
        if (message.resumable == true && message.newHandle != null) {
            activeSessionHandle = message.newHandle
            Log.d("TAG", "SessionResumptionUpdate: handle ${message.newHandle}")
        }
    }
}

// The following are alternative options to resume a session. Choose only one.

// Option 1: Create and connect a session to resume with the saved handle
activeSessionHandle?.let { handle ->
    session = liveModel.connect(
        sessionResumption = SessionResumptionConfig(handle = handle)
    )
}

// Option 2: Resume the session directly on an existing session object
activeSessionHandle?.let { handle ->
    session.resumeSession(
        sessionResumption = SessionResumptionConfig(handle = handle)
    )
}

Java

For Java, session resumption is not yet supported. Check back soon!

Web

// Local variable to save the active session handle
let activeSessionHandle = null;

// Initialize the session. Passing an empty object requests the server to send SessionResumptionUpdate
let session = await liveModel.connect({});

// Start receiving responses
for await (const message of session.receive()) {
  // Process other received response types...

  // Check for new session handles inside your message handling loop
  if (message.type === 'sessionResumptionUpdate') {
    if (message.resumable && message.newHandle) {
      activeSessionHandle = message.newHandle;
      console.log(`SessionResumptionUpdate: handle ${activeSessionHandle}`);
    }
  }
}

// The following are alternative options to resume a session. Choose only one.

// Option 1: Create and connect a session to resume with the saved handle
if (activeSessionHandle) {
  session = await liveModel.connect({
    handle: activeSessionHandle
  });
}

// Option 2: Resume the session directly on an existing session object
if (activeSessionHandle) {
  await session.resumeSession({
    handle: activeSessionHandle
  });
}

Dart

// Local variable to save the active session handle
String? _activeSessionHandle;

// Initialize the session. Passing an empty config requests the server to send SessionResumptionUpdate
var _session = await _liveModel.connect(
  sessionResumption: SessionResumptionConfig(),
);

// Start receiving responses
await for (final message in _session.receive()) {
  // Process other received response types...

  // Check for new session handles inside your message handling loop
  if (message is SessionResumptionUpdate &&
      message.resumable != null &&
      message.resumable!) {
    _activeSessionHandle = message.newHandle;
    log('SessionResumptionUpdate: handle ${message.newHandle}');
  }
}

// The following are alternative options to resume a session. Choose only one.

// Option 1: Create and connect a session to resume with the saved handle
if (_activeSessionHandle != null) {
  _session = await _liveModel.connect(
    sessionResumption: SessionResumptionConfig.resume(_activeSessionHandle!),
  );
}

// Option 2: Alternatively, resume the session directly on an existing session object
if (_activeSessionHandle != null) {
  await _session.resumeSession(
    sessionResumption: SessionResumptionConfig.resume(_activeSessionHandle!),
  );
}

Unity

// Local variable to save the active session handle
string activeSessionHandle = null;

// Initialize the session. Passing an empty config requests the server to send SessionResumptionUpdate
var session = await liveModel.ConnectAsync(
    sessionResumption: new SessionResumptionConfig()
);

// Start receiving responses
await foreach (var response in session.ReceiveAsync())
{
  // Process other received response types...

  // Check for new session handles inside your message handling loop
  if (response.Message is LiveSessionResumptionUpdate updateMessage)
  {
    if (updateMessage.Resumable == true && !string.IsNullOrEmpty(updateMessage.NewHandle))
    {
      activeSessionHandle = updateMessage.NewHandle;
      Debug.Log($"SessionResumptionUpdate: handle {activeSessionHandle}");
    }
  }
}

// The following are alternative options to resume a session. Choose only one.

// Option 1: Create and connect a session to resume with the saved handle
if (!string.IsNullOrEmpty(activeSessionHandle)) {
  session = await liveModel.ConnectAsync(
      sessionResumption: new SessionResumptionConfig(activeSessionHandle)
  );
}

// Option 2: Resume the session directly on an existing session object
if (!string.IsNullOrEmpty(activeSessionHandle)) {
  await session.ResumeSessionAsync(
      sessionResumption: new SessionResumptionConfig(activeSessionHandle)
  );
}