Gemini 3 Pro, Gemini 3 Pro Image (nano banana pro), and the latest Gemini Live API native audio models are now available to use with Firebase AI Logic on all platforms!

此页面由 Cloud Translation API 翻译。

使用 Imagen 基于控制变量自定义图片

本页介绍了如何使用 Firebase AI Logic SDK，通过 Imagen 的自定义功能，根据指定的控制编辑或生成图片。

运作方式：您提供文本提示和至少一张控制参考图片（例如涂鸦或 Canny 边缘图片）。模型会使用这些输入内容，根据控制图片生成新图片。

例如，您可以向模型提供火箭和月球的简笔画，以及一段文字提示，让模型根据简笔画创作一幅水彩画。

重要提示：请查看预期和非预期 使用场景列表，以便通过个性化获得更理想的结果。

跳转到代码

控制参考图片的类型

受控自定义的参考图片可以是涂鸦、Canny 边缘图片或人脸网格。

什么是涂鸦？

涂鸦是一种粗略的手绘草图或轮廓，可为模型提供要遵循的基本结构、空间布局和布局。文本提示可提供生成图片的详细信息、颜色和纹理。

示例：您提供了一张房屋、树木和太阳的绘画，还提供了一段文字提示，例如“一幅异想天开的水彩画，描绘的是日出时分，一间小屋旁边有一棵巨大的橡树。”然后，模型会生成与所描述场景相符的图片，同时遵循您绘制的草图中的总体布局。

什么是 Canny 边缘图片？

Canny 边缘图像是指对源图像应用算法（尤其是 Canny 边缘检测器）来映射图像中对象的边缘。这些边缘有助于模型在更改文本提示中指定的样式、颜色或其他属性的同时，保持对象的精确结构。

示例：您有一张照片，其中显示一只狗坐在沙发上。您对照片运行 Canny 边缘检测器，以获得仅包含狗和沙发轮廓的图像。然后，您可以使用此边缘图作为控制图片，并使用“一张坐在皮沙发上的金毛猎犬幼犬的照片”之类的文本提示。该模型将生成一张新照片，其中包含与原照片中狗狗完全相同的姿势和沙发构图，但原照片中的对象将替换为金毛寻回犬幼犬和皮沙发。

什么是人脸网格？

面部网格是一种有助于模型了解和复制特定面部的图片。它是人脸的 3D 数字表示形式，通常是由相互连接的点（顶点）和三角形组成的网络，用于定义人脸的形状和轮廓。这可为模型提供关键地标（例如眼睛、鼻子和嘴）和纹理。

准备工作

仅在将 Vertex AI Gemini API 用作 API 提供方时可用。

如果您尚未完成入门指南，请先完成该指南。该指南介绍了如何设置 Firebase 项目、将应用连接到 Firebase、添加 SDK、为所选的 API 提供方初始化后端服务，以及创建 ImagenModel 实例。

始终使用最新版本的 Firebase AI Logic SDK。 如果您仍在使用“Vertex AI in Firebase”SDK，请参阅迁移指南。

支持此功能的模型

Imagen 通过其 capability 模型提供图片编辑功能：

imagen-3.0-capability-001

请注意，对于 Imagen 模型，不支持 global 位置。

发送受控自定义请求

以下示例展示了一个受控自定义请求，该请求要求模型根据提供的参考图片（在本例中为太空绘画，例如火箭和月球）生成新图片。由于参考图片是粗略的手绘草图或轮廓，因此它使用控制类型 CONTROL_TYPE_SCRIBBLE。

如果您的参考图片是 Canny 边缘图片或面部网格，您也可以使用此示例，但需要进行以下更改：

如果您的参考图片是 Canny 边缘图片，请使用控制类型 CONTROL_TYPE_CANNY。

如果您的参考图片是面部网格，请使用控制类型 CONTROL_TYPE_FACE_MESH。此控件只能用于人物正文自定义。

请在本页后面部分查看提示模板，了解如何撰写提示以及如何在提示中使用参考图片。

Swift
Swift 不支持使用 Imagen 模型进行图片编辑。今年晚些时候再回来查看！

Kotlin
// Using this SDK to access Imagen models is a Preview release and requires opt-in @OptIn(PublicPreviewAPI::class) suspend fun customizeImage() { // Initialize the Vertex AI Gemini API backend service // Optionally specify the location to access the model (for example, `us-central1`) val ai = Firebase.ai(backend = GenerativeBackend.vertexAI(location = "us-central1")) // Create an `ImagenModel` instance with an Imagen "capability" model val model = ai.imagenModel("imagen-3.0-capability-001") // This example assumes 'referenceImage' is a pre-loaded Bitmap. // In a real app, this might come from the user's device or a URL. val referenceImage: Bitmap = TODO("Load your reference image Bitmap here") // Define the subject reference using the reference image. val controlReference = ImagenControlReference( image = referenceImage, referenceID = 1, controlType = CONTROL_TYPE_SCRIBBLE ) // Provide a prompt that describes the final image. // The "[1]" links the prompt to the subject reference with ID 1. val prompt = "A cat flying through outer space arranged like the space scribble[1]" // Use the editImage API to perform the controlled customization. // Pass the list of references, the prompt, and an editing configuration. val editedImage = model.editImage( referenceImages = listOf(controlReference), prompt = prompt, config = ImagenEditingConfig( editSteps = 50 // Number of editing steps, a higher value can improve quality ) ) // Process the result }

Java
// Initialize the Vertex AI Gemini API backend service // Optionally specify the location to access the model (for example, `us-central1`) // Create an `ImagenModel` instance with an Imagen "capability" model ImagenModel imagenModel = FirebaseAI.getInstance(GenerativeBackend.vertexAI("us-central1")) .imagenModel( /* modelName */ "imagen-3.0-capability-001"); ImagenModelFutures model = ImagenModelFutures.from(imagenModel); // This example assumes 'referenceImage' is a pre-loaded Bitmap. // In a real app, this might come from the user's device or a URL. Bitmap referenceImage = null; // TODO("Load your image Bitmap here"); // Define the subject reference using the reference image. ImagenControlReference controlReference = new ImagenControlReference.Builder() .setImage(referenceImage) .setReferenceID(1) .setControlType(CONTROL_TYPE_SCRIBBLE) .build(); // Provide a prompt that describes the final image. // The "[1]" links the prompt to the subject reference with ID 1. String prompt = "A cat flying through outer space arranged like the space scribble[1]"; // Define the editing configuration. ImagenEditingConfig imagenEditingConfig = new ImagenEditingConfig.Builder() .setEditSteps(50) // Number of editing steps, a higher value can improve quality .build(); // Use the editImage API to perform the controlled customization. // Pass the list of references, the prompt, and an editing configuration. Futures.addCallback(model.editImage(Collections.singletonList(controlReference), prompt, imagenEditingConfig), new FutureCallback<ImagenGenerationResponse>() { @Override public void onSuccess(ImagenGenerationResponse result) { if (result.getImages().isEmpty()) { Log.d("TAG", "No images generated"); } Bitmap bitmap = ((ImagenInlineImage) result.getImages().get(0)).asBitmap(); // Use the bitmap to display the image in your UI } @Override public void onFailure(Throwable t) { // ... } }, Executors.newSingleThreadExecutor());

Web
Web 应用不支持使用 Imagen 模型进行图片编辑。今年晚些时候再回来查看！

Dart
import 'dart:typed_data'; import 'package:firebase_ai/firebase_ai.dart'; import 'package:firebase_core/firebase_core.dart'; import 'firebase_options.dart'; // Initialize FirebaseApp await Firebase.initializeApp( options: DefaultFirebaseOptions.currentPlatform, ); // Initialize the Vertex AI Gemini API backend service // Optionally specify a location to access the model (for example, `us-central1`) final ai = FirebaseAI.vertexAI(location: 'us-central1'); // Create an `ImagenModel` instance with an Imagen "capability" model final model = ai.imagenModel(model: 'imagen-3.0-capability-001'); // This example assumes 'referenceImage' is a pre-loaded Uint8List. // In a real app, this might come from the user's device or a URL. final Uint8List referenceImage = Uint8List(0); // TODO: Load your reference image data here // Define the control reference using the reference image. final controlReference = ImagenControlReference( image: referenceImage, referenceId: 1, controlType: ImagenControlType.scribble, ); // Provide a prompt that describes the final image. // The "[1]" links the prompt to the subject reference with ID 1. final prompt = "A cat flying through outer space arranged like the space scribble[1]"; try { // Use the editImage API to perform the controlled customization. // Pass the list of references, the prompt, and an editing configuration. final response = await model.editImage( [controlReference], prompt, config: ImagenEditingConfig( editSteps: 50, // Number of editing steps, a higher value can improve quality ), ); // Process the result. if (response.images.isNotEmpty) { final editedImage = response.images.first.bytes; // Use the editedImage (a Uint8List) to display the image, save it, etc. print('Image successfully generated!'); } else { // Handle the case where no images were generated. print('Error: No images were generated.'); } } catch (e) { // Handle any potential errors during the API call. print('An error occurred: $e'); }

Unity
Unity 不支持使用 Imagen 模型进行图片编辑。今年晚些时候再回来查看！

提示模板

在请求中，您可以通过定义 ImagenControlReference 来提供参考图片（最多 4 张），并在其中指定图片的参考 ID。请注意，多张图片可以具有相同的参考 ID（例如，同一想法的多张涂鸦）。

然后，在编写提示时，您会引用这些 ID。例如，您可以在提示中使用 [1] 来引用参考 ID 为 1 的图片。

重要提示：请查看预期和非预期 使用场景列表，以便通过个性化获得更理想的结果。

下表提供了提示模板，您可以从这些模板入手，根据控件撰写自定义提示。

使用场景参考图片提示模板示例

受控自定义涂鸦地图 (1) Generate an image that aligns with the scribble map [1] to match the description: ${STYLE_PROMPT} ${PROMPT}. Generate an image that aligns with the scribble map [1] to match the description: The image should be in the style of an impressionistic oil painting with relaxed brushstrokes. It possesses a naturally-lit ambience and noticeable brushstrokes. A side-view of a car. The car is parked on a wet, reflective road surface, with city lights reflecting in the puddles.

受控自定义 Canny 控制图片 (1) Generate an image aligning with the edge map [1] to match the description: ${STYLE_PROMPT} ${PROMPT} Generate an image aligning with the edge map [1] to match the description: The image should be in the style of an impressionistic oil painting, with relaxed brushstrokes. It posses a naturally-lit ambience and noticeable brushstrokes. A side-view of a car. The car is parked on a wet, reflective road surface, with city lights reflecting in the puddles.

使用FaceMesh输入的人物图片风格化处理主题图片 (1-3)
FaceMesh 控制图片 (1) Create an image about SUBJECT_DESCRIPTION [1] in the pose of the CONTROL_IMAGE [2] to match the description: a portrait of SUBJECT_DESCRIPTION [1] ${PROMPT} Create an image about a woman with short hair [1] in the pose of the control image [2] to match the description: a portrait of a woman with short hair [1] in 3D-cartoon style with a blurred background. A cute and lovely character, with a smiling face, looking at the camera, pastel color tone ...

使用FaceMesh输入的人物图片风格化处理主题图片 (1-3)
FaceMesh 控制图片 (1) Create a ${STYLE_PROMPT} image about SUBJECT_DESCRIPTION [1] in the pose of the CONTROL_IMAGE [2] to match the description: a portrait of SUBJECT_DESCRIPTION [1] ${PROMPT} Create a 3D-cartoon style image about a woman with short hair [1] in the pose of the control image [2] to match the description: a portrait of a woman with short hair [1] in 3D-cartoon style with a blurred background. A cute and lovely character, with a smiling face, looking at the camera, pastel color tone ...

最佳做法和限制

使用场景

自定义功能可提供自由式提示，这可能会给人一种印象，即模型能完成的任务比训练时学到的更多。以下部分介绍了自定义功能的预期应用场景，以及一些并非详尽无遗的非预期应用场景示例。

我们建议您将此功能用于预期应用场景，因为我们已针对这些应用场景训练了模型，可期望获得良好的结果。反之，如果您让模型执行预期应用场景之外的任务，则应预料到结果不理想。

预期应用场景

以下是基于控制变量的自定义的预期应用场景：

生成符合提示和 Canny 边缘控制图片的图片。

生成符合提示和涂鸦图片的图片。

对人像照片进行风格化处理，同时保留面部表情。

非预期应用场景示例

以下列出了基于控制的自定义功能的非预期应用场景（并非详尽无遗）。该模型未针对这些使用场景进行训练，因此很可能会生成不理想的结果。

使用提示中指定的风格生成图片。

根据文本生成图片，且该图片采用通过参考图片提供的特定风格，同时使用控制图片对图片构图进行一定程度的控制。

根据文本生成图片，且该图片采用通过参考图片提供的特定风格，同时使用控制涂鸦对图片构图进行一定程度的控制。

根据文本生成图片，且该图片采用参考图片提供的特定风格，同时使用控制图片对图片构图进行一定程度的控制。图片中的人物具有特定的面部表情。

对包含两个或更多人物的照片进行风格化处理，并保留其面部表情。

对宠物照片进行风格化处理并将其转换为绘画。保留或指定图片的构图（例如水彩）。

使用场景	参考图片	提示模板	示例
受控自定义	涂鸦地图 (1)	Generate an image that aligns with the `scribble map [1]` to match the description: ${STYLE_PROMPT} ${PROMPT}.	Generate an image that aligns with the `scribble map [1]` to match the description: The image should be in the style of an impressionistic oil painting with relaxed brushstrokes. It possesses a naturally-lit ambience and noticeable brushstrokes. A side-view of a car. The car is parked on a wet, reflective road surface, with city lights reflecting in the puddles.
受控自定义	Canny 控制图片 (1)	Generate an image aligning with the `edge map [1]` to match the description: ${STYLE_PROMPT} ${PROMPT}	Generate an image aligning with the `edge map [1]` to match the description: The image should be in the style of an impressionistic oil painting, with relaxed brushstrokes. It posses a naturally-lit ambience and noticeable brushstrokes. A side-view of a car. The car is parked on a wet, reflective road surface, with city lights reflecting in the puddles.
使用FaceMesh输入的人物图片风格化处理	主题图片 (1-3) FaceMesh 控制图片 (1)	Create an image about `SUBJECT_DESCRIPTION [1]` in the pose of the `CONTROL_IMAGE [2]` to match the description: a portrait of `SUBJECT_DESCRIPTION [1]` ${PROMPT}	Create an image about `a woman with short hair [1]` in the pose of the `control image [2]` to match the description: a portrait of `a woman with short hair [1]` in 3D-cartoon style with a blurred background. A cute and lovely character, with a smiling face, looking at the camera, pastel color tone ...
使用FaceMesh输入的人物图片风格化处理	主题图片 (1-3) FaceMesh 控制图片 (1)	Create a ${STYLE_PROMPT} image about `SUBJECT_DESCRIPTION [1]` in the pose of the `CONTROL_IMAGE [2]` to match the description: a portrait of `SUBJECT_DESCRIPTION [1]` ${PROMPT}	Create a 3D-cartoon style image about `a woman with short hair [1]` in the pose of the `control image [2]` to match the description: a portrait of `a woman with short hair [1]` in 3D-cartoon style with a blurred background. A cute and lovely character, with a smiling face, looking at the camera, pastel color tone ...

使用 Imagen 基于控制变量自定义图片 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

控制参考图片的类型

准备工作

支持此功能的模型

发送受控自定义请求

Swift

Kotlin

Java

Web

Dart

Unity

提示模板

最佳做法和限制

使用场景

预期应用场景

非预期应用场景示例

使用 Imagen 基于控制变量自定义图片