The latest Gemini models, like Gemini 3.1 Flash Image (Nano Banana 2), are available to use with Firebase AI Logic! Learn more.

Gemini 2.0 Flash and Flash-Lite models will shut down on June 1, 2026. To avoid service disruption, update to a newer model like gemini-3.1-flash-lite. Learn more.

All Imagen models will shut down on June 24, 2026. Learn about migrating your apps to use Nano Banana.

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Firebase AI Logic 中的上下文缓存

对于 AI 功能，您可能会反复向模型传递相同的输入 token（内容）。对于这些使用场景，您可以改为缓存此内容，也就是说，您只需向模型传递一次内容，然后存储该内容，并在后续请求中引用该内容。

对于涉及大量内容（例如大量文本、音频文件或视频文件）的重复性任务，上下文缓存可以显著降低延迟时间和费用。缓存内容的一些常见使用场景包括详细的角色文档、代码库或手册。

Gemini 模型提供两种不同的缓存机制：

隐式缓存： 在大多数模型上自动启用，不保证节省费用
显式缓存：可在大多数模型上选择性且手动启用，通常可以节省费用

如果您希望更有可能保证节省费用，但需要进行一些额外的开发者工作，则显式缓存非常有用。

无论是隐式缓存还是显式缓存，回答的元数据中的 cachedContentTokenCount 字段均表示输入中已缓存部分的 token 数。对于显式缓存，请务必查看本页底部的价格信息。

支持的模型

使用以下模型时支持缓存：

gemini-3.1-pro-preview
gemini-3-flash-preview
gemini-3.1-flash-lite
gemini-2.5-pro
gemini-2.5-flash
gemini-2.5-flash-lite

媒体生成模型（例如 Nana Banana 模型，如 gemini-3.1-flash-image-preview）不支持上下文缓存。

缓存内容的大小限制

每个模型对缓存内容都有最低 token 数要求。上限取决于模型的上下文窗口。

Gemini Pro 模型：最低 4096 个 token
Gemini Flash 模型：最低 1024 个 token

此外，您可以使用 blob 或文本缓存的内容的大小上限为 10 MB。

隐式缓存

隐式缓存默认处于启用状态，适用于大多数 Gemini 模型。

如果您的请求命中缓存的内容，Google 会自动传递节省的费用。以下是一些增加请求使用隐式缓存的机会的方法：

尝试将较大且常见的内容放置在提示的开头。
尝试在短时间内发送具有相似前缀的请求。

回答的元数据中的 cachedContentTokenCount 字段提供了输入中已缓存部分的 token 数。

显式缓存

显式缓存默认未启用 ，它是 Gemini 模型的一项可选功能。

您可以按如下方式设置和使用显式内容缓存：

创建显式缓存，然后使用该缓存
管理显式缓存，包括：

请注意，显式内容缓存会与隐式缓存相互影响，可能会导致超出显式缓存内容的额外缓存。您可以停用隐式缓存并避免创建显式缓存，以防止缓存数据保留。如需了解详情，请参阅启用和停用缓存。

创建和使用显式缓存

创建和使用显式内容缓存需要执行以下操作：

创建显式缓存。
在服务器提示模板中引用缓存。
在应用发出的提示请求中引用服务器提示模板。

有关创建和使用显式缓存的重要信息

您的缓存必须与应用的提示请求和服务器提示模板保持一致：

缓存特定于 Gemini API 提供商。应用的提示请求必须使用相同的提供商。
对于 Firebase AI Logic，我们强烈建议仅将显式内容缓存与 Vertex AI Gemini API 搭配使用。本页上的所有信息和示例均特定于该 Gemini API 提供商。
缓存特定于 Gemini 模型。应用的提示请求必须使用相同的模型。
使用 Vertex AI Gemini API时，缓存特定于某个位置。
显式缓存的位置必须与服务器提示模板的位置以及您在应用的提示请求中访问模型的位置一致。

此外，请注意以下有关显式缓存的限制和要求：

创建显式缓存后，您无法更改缓存的任何内容，只能更改 TTL 或到期时间。
您可以缓存任何受支持的输入文件 MIME 类型，甚至可以仅缓存在缓存创建请求中提供的文本。
如果您想在缓存中添加文件，必须以 Cloud Storage URI 的形式提供该文件。它不能是浏览器网址或 YouTube 网址。

此外，系统会在 缓存创建时检查对文件的访问限制，并且 不会在用户请求时再次检查访问限制。因此，请确保显式缓存中包含的任何数据都适合发出包含该缓存的请求的任何用户。
如果您想使用系统说明或工具（例如代码执行、网址上下文或使用 Google Search 进行 Grounding），则缓存本身必须包含其配置。它们无法在服务器提示模板或应用的提示请求中进行配置。请注意，服务器提示模板 尚不支持 函数调用（或聊天）。如需详细了解如何在缓存中配置系统说明和工具，请参阅 Vertex AI Gemini API的 REST API。

第 1 步：创建缓存

直接使用的 REST API 创建缓存。Vertex AI Gemini API

以下示例会创建一个显式缓存，并以 PDF 文件作为其内容。

语法：

PROJECT_ID="PROJECT_ID"
MODEL_ID="GEMINI_MODEL"  # for example, gemini-3-flash-preview
LOCATION="LOCATION"  # location for both the cache and the model
MIME_TYPE="MIME_TYPE"
CACHED_CONTENT_URI="CLOUD_STORAGE_FILE_URI"  # must be a Cloud Storage URI
CACHE_DISPLAY_NAME="CACHE_DISPLAY_NAME"  # optional
TTL="CACHE_TIME_TO_LIVE"  # optional (if not specified, defaults to 3600s)

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/cachedContents \
-d @- <<EOF
{
  "model":"projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}",
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "fileData": {
            "mimeType": "${MIME_TYPE}",
            "fileUri": "${CACHED_CONTENT_URI}"
          }
        }
      ]
    }
  ],
  "displayName": "${CACHE_DISPLAY_NAME}",
  "ttl": "${TTL}"
}
EOF

示例请求：

PROJECT_ID="my-amazing-app"
MODEL_ID="gemini-3-flash-preview"
LOCATION="global"
MIME_TYPE="application/pdf"
CACHED_CONTENT_URI="gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf"
CACHE_DISPLAY_NAME="Gemini - A Family of Highly Capable Multimodal Model (PDF)"
TTL="7200s"

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/cachedContents \
-d @- <<EOF
{
  "model":"projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}",
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "fileData": {
            "mimeType": "${MIME_TYPE}",
            "fileUri": "${CACHED_CONTENT_URI}"
          }
        }
      ]
    }
  ],
  "displayName": "${CACHE_DISPLAY_NAME}",
  "ttl": "${TTL}"
}
EOF

示例响应：

响应包含一个完全限定的资源 name，该名称对于缓存是全局唯一的（请注意，最后一个部分是缓存 ID）。您将在工作流的下一步中使用整个 name 值。

{
  "name": "projects/861083271981/locations/global/cachedContents/4545031458888089601",
  "model": "projects/my-amazing-app/locations/global/publishers/google/models/gemini-3-flash-preview",
  "createTime": "2024-06-04T01:11:50.808236Z",
  "updateTime": "2024-06-04T01:11:50.808236Z",
  "expireTime": "2024-06-04T02:11:50.794542Z"
}

第 2 步：在服务器提示模板中引用缓存

创建缓存后，在cachedContent 属性中按 name 引用该缓存服务器提示模板。

创建服务器提示模板时，请务必遵循以下要求：

使用您创建缓存时响应中的完全限定资源 name。这不是您在请求中指定的可选显示名称。
服务器提示模板的位置必须与缓存的位置一致。
如需使用系统说明或工具，必须将其配置为缓存的一部分而不是服务器提示模板的一部分。

语法：

{{cachedContent name="YOUR_CACHE_RESOURCE_NAME"}}

{{role "user"}}
{{userPrompt}}

示例：

{{cachedContent name="projects/861083271981/locations/global/cachedContents/4545031458888089601"}}

{{role "user"}}
{{userPrompt}}

或者，服务器提示模板中 name 参数的值可以是动态输入变量。例如， {{cachedContent name=someVariable}} 可让您将缓存的 name 作为来自应用的请求的输入。

第 3 步：在应用的请求中引用服务器提示模板

编写请求时，请务必注意以下事项：

使用 Vertex AI Gemini API，因为缓存是使用该 Gemini API提供方创建的。
您在应用的提示请求中访问模型的位置必须与服务器提示模板和缓存的位置一致。

Swift

// ...

// Initialize the Vertex AI Gemini API backend service
// Create a `TemplateGenerativeModel` instance
// Make sure to specify the same location as the server prompt template and the cache
let model = FirebaseAI.firebaseAI(backend: .vertexAI(location: "LOCATION"))
                                  .templateGenerativeModel()

do {
    let response = try await model.generateContent(
        // Specify your template ID
        templateID: "TEMPLATE_ID"
    )
    if let text = response.text {
        print("Response Text: \(text)")
    }
} catch {
    print("An error occurred: \(error)")
}
print("\n")

Kotlin

// ...

// Initialize the Vertex AI Gemini API backend service
// Create a `TemplateGenerativeModel` instance
// Make sure to specify the same location as the server prompt template and the cache
val model = Firebase.ai(backend = GenerativeBackend.vertexAI(location = "LOCATION"))
                        .templateGenerativeModel()

val response = model.generateContent(
    // Specify your template ID
    "TEMPLATE_ID",
)

val text = response.text
println(text)

Java

// ...

// Initialize the Vertex AI Gemini API backend service
// Create a `TemplateGenerativeModel` instance
// Make sure to specify the same location as the server prompt template and the cache
TemplateGenerativeModel generativeModel = FirebaseAI.getInstance().templateGenerativeModel();

TemplateGenerativeModelFutures model = TemplateGenerativeModelFutures.from(generativeModel);

Future<GenerateContentResponse> response = model.generateContent(
    // Specify your template ID
    "TEMPLATE_ID"
);
addCallback(response,
      new FutureCallback<GenerateContentResponse>() {
          public void onSuccess(GenerateContentResponse result) {
            System.out.println(result.getText());
          }
          public void onFailure(Throwable t) {
            reportError(t);
          }
    }
executor);

Web

// ...

// Initialize the Vertex AI Gemini API backend service
// Make sure to specify the same location as the server prompt template and the cache
const ai = getAI(app, { backend: new VertexAIBackend('LOCATION') });

// Create a `TemplateGenerativeModel` instance
const model = getTemplateGenerativeModel(ai);

const result = await model.generateContent(
  // Specify your template ID
  'TEMPLATE_ID'
);

const response = result.response;
const text = response.text();

Dart

// ...

// Initialize the Vertex AI Gemini API backend service
// Create a `TemplateGenerativeModel` instance
// Make sure to specify the same location as the server prompt template and the cache
var _model = FirebaseAI.vertexAI(location: 'LOCATION').templateGenerativeModel()

var response = await _model.generateContent(
        // Specify your template ID
        'TEMPLATE_ID',
      );

var text = response?.text;
print(text);

Unity

// ...

// Initialize the Vertex AI Gemini API backend service
// Make sure to specify the same location as the server prompt template and the cache
var firebaseAI = FirebaseAI.GetInstance(FirebaseAI.Backend.VertexAI(location: "LOCATION"));

// Create a `TemplateGenerativeModel` instance
var model = firebaseAI.GetTemplateGenerativeModel();

try
{
  var response = await model.GenerateContentAsync(
      // Specify your template ID
      "TEMPLATE_ID"
  );
  Debug.Log($"Response Text: {response.Text}");
}
catch (Exception e) {
  Debug.LogError($"An error occurred: {e.Message}");
}

管理显式缓存

本部分介绍了如何管理显式内容缓存，包括如何列出所有缓存、获取有关缓存的元数据、更新缓存的 TTL 或到期时间以及删除缓存。

您可以使用的 Vertex AI Gemini API管理显式缓存。

创建显式内容缓存后，您无法更改缓存的任何内容，只能更改 TTL 或到期时间。

列出所有缓存

您可以列出项目中可用的所有显式缓存。此命令只会返回指定位置中的缓存。

PROJECT_ID="PROJECT_ID"
LOCATION="LOCATION"

curl \
-X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/cachedContents

获取有关缓存的元数据

无法检索或查看实际缓存的内容。不过，您可以检索有关显式缓存的元数据，包括name、model、display_name、usage_metadata、create_time、update_time和expire_time。

您需要提供 CACHE_ID，它是缓存的完全限定资源 name 中的最后一个部分。

PROJECT_ID="PROJECT_ID"
LOCATION="LOCATION"
CACHE_ID="CACHE_ID"  # the final segment in the `name` of the cache

curl \
-X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/cachedContents/${CACHE_ID}

更新缓存的 TTL 或到期时间

创建显式缓存时，您可以选择性地设置 ttl 或 expire_time。

ttl：缓存的 TTL（存留时间），具体是指缓存自创建或 ttl 更新后到过期前的存留时间（以秒和纳秒为单位）。当您设置 ttl 时，缓存的 expireTime 会自动更新。
expire_time：一个 Timestamp（例如 2024-06-30T09:00:00.000000Z），用于指定缓存到期时的绝对日期和时间。

如果您未设置这两个值中的任何一个，默认 TTL 为 1 小时 。TTL 没有最小值或最大值限制。

对于现有的显式缓存，您可以添加或更新 ttl 或 expire_time。您需要提供 CACHE_ID，它是缓存的完全限定资源 name 中的最后一个部分。

更新 ttl

PROJECT_ID="PROJECT_ID"
LOCATION="LOCATION"
CACHE_ID="CACHE_ID"  # the final segment in the `name` of the cache
TTL="CACHE_TIME_TO_LIVE"

curl \
-X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/cachedContents/${CACHE_ID} -d \
'{
  "ttl": "'$TTL'"
}'

更新 expire_time

PROJECT_ID="PROJECT_ID"
LOCATION="LOCATION"
CACHE_ID="CACHE_ID"  # the final segment in the `name` of the cache
EXPIRE_TIME="ABSOLUTE_TIME_CACHE_EXPIRES"

curl \
-X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/cachedContents/${CACHE_ID} -d \
'{
  "expire_time": "'$EXPIRE_TIME'"
}'

删除缓存

不再需要显式缓存时，您可以将其删除。