All Gemini 1.0 and Gemini 1.5 models are now retired.
To avoid service disruption, update to a newer model (for example, gemini-2.5-flash-lite). Learn more.

このページは Cloud Translation API によって翻訳されました。

Imagen を使用してコントロールに基づいて画像をカスタマイズする

このページでは、Imagen のカスタマイズ機能を使用して、Firebase AI Logic SDK で指定された制御に基づいて画像を編集または生成する方法について説明します。

仕組み: テキストプロンプトと、少なくとも 1 つのコントロール参照画像（描画や Canny エッジ画像など）を指定します。モデルはこれらの入力を使用して、制御画像に基づいて新しい画像を生成します。

たとえば、ロケットと月を描いた絵とテキストプロンプトをモデルに提供して、その絵に基づいて水彩画を作成できます。

コードに移動

制御参照画像のタイプ

制御されたカスタマイズの参照画像は、フリーハンド、Canny エッジ画像、または顔メッシュにできます。

スクリブルとは

落書きは、モデルに基本的な構造、空間配置、レイアウトを提供する、手描きの粗いスケッチまたはアウトラインです。テキストプロンプトは、生成された画像の詳細、色、テクスチャを提供します。

例: 家、木、太陽の絵と、「朝日が昇る湖に浮かぶ、木製のボートのクローズアップ画像。周囲は木々に囲まれています。」のようなテキストプロンプトを指定します。モデルは、描画の一般的なレイアウトに従いながら、説明されたシーンに一致する画像を生成します。

Canny エッジ画像とは

Canny エッジ画像は、アルゴリズム（具体的には Canny エッジ検出器）がソース画像に適用され、画像内のオブジェクトのエッジがマッピングされた画像です。これらのエッジは、モデルがテキストプロンプトで指定されたスタイル、色、その他の属性を変更しながら、オブジェクトの正確な構造を維持するのに役立ちます。

例: ソファに座っている犬の写真があるとします。写真に対して Canny エッジ検出器を実行し、犬とソファの輪郭だけの画像を取得します。このエッジマップを制御画像として使用し、「革張りのソファに座っているゴールデンレトリバーの子犬の写真」などのテキストプロンプトを使用します。モデルは、元の犬のポーズとソファの構図を正確に一致させながら、元の被写体の代わりにゴールデンレトリバーの子犬と革張りのソファを使用した新しい写真を生成します。

フェイスメッシュとは

フェイスメッシュは、モデルが特定の顔を理解して複製するのに役立つ画像です。3D の人間の顔をデジタルで表現したもので、通常は、顔の形と輪郭を定義する相互接続された点（頂点）と三角形のネットワークです。これにより、モデルに重要なランドマーク（目、鼻、口など）とテクスチャが提供されます。

始める前に

Vertex AI Gemini API を API プロバイダとして使用している場合にのみ使用できます。

まだ完了していない場合は、スタートガイドに沿って、記載されている手順（Firebase プロジェクトの設定、アプリと Firebase の連携、SDK の追加、選択した API プロバイダのバックエンドサービスの初期化、ImagenModel インスタンスの作成）を完了します。

この機能をサポートするモデル

Imagen は、capability モデルを通じて画像編集を提供します。

imagen-3.0-capability-001

Imagen モデルの場合、global のロケーションはサポートされていません。

制御されたカスタマイズリクエストを送信する

次のサンプルは、提供された参照画像（この例では、ロケットや月などの宇宙の絵）に基づいて新しい画像を生成するようにモデルに要求する、制御されたカスタマイズリクエストを示しています。参照画像は手書きのラフなスケッチまたはアウトラインであるため、コントロールタイプ CONTROL_TYPE_SCRIBBLE を使用します。

参照画像が Canny エッジ画像または顔メッシュの場合、次の変更を加えてこの例を使用することもできます。

参照画像が Canny エッジ画像の場合は、制御タイプ CONTROL_TYPE_CANNY を使用します。
参照画像がフェイスメッシュの場合は、制御タイプ CONTROL_TYPE_FACE_MESH を使用します。このコントロールは、ユーザーの件名のカスタマイズでのみ使用できます。

このページの後半でプロンプトテンプレートを確認して、プロンプトの作成方法と、プロンプト内で参照画像を使用する方法について学習してください。

Swift

Swift では、Imagen モデルを使用した画像編集はサポートされていません。今年中にリリース予定です。

Kotlin

// Using this SDK to access Imagen models is a Preview release and requires opt-in
@OptIn(PublicPreviewAPI::class)
suspend fun customizeImage() {
    // Initialize the Vertex AI Gemini API backend service
    // Optionally specify the location to access the model (for example, `us-central1`)
    val ai = Firebase.ai(backend = GenerativeBackend.vertexAI(location = "us-central1"))

    // Create an `ImagenModel` instance with an Imagen "capability" model
    val model = ai.imagenModel("imagen-3.0-capability-001")

    // This example assumes 'referenceImage' is a pre-loaded Bitmap.
    // In a real app, this might come from the user's device or a URL.
    val referenceImage: Bitmap = TODO("Load your reference image Bitmap here")

    // Define the subject reference using the reference image.
    val controlReference = ImagenControlReference(
        image = referenceImage,
        referenceID = 1,
        controlType = CONTROL_TYPE_SCRIBBLE
    )

    // Provide a prompt that describes the final image.
    // The "[1]" links the prompt to the subject reference with ID 1.
    val prompt = "A cat flying through outer space arranged like the space scribble[1]"

    // Use the editImage API to perform the controlled customization.
    // Pass the list of references, the prompt, and an editing configuration.
    val editedImage = model.editImage(
        referenceImages = listOf(controlReference),
        prompt = prompt,
        config = ImagenEditingConfig(
            editSteps = 50 // Number of editing steps, a higher value can improve quality
        )
    )

    // Process the result
}

Java

// Initialize the Vertex AI Gemini API backend service
// Optionally specify the location to access the model (for example, `us-central1`)
// Create an `ImagenModel` instance with an Imagen "capability" model
ImagenModel imagenModel = FirebaseAI.getInstance(GenerativeBackend.vertexAI("us-central1"))
        .imagenModel(
                /* modelName */ "imagen-3.0-capability-001");

ImagenModelFutures model = ImagenModelFutures.from(imagenModel);

// This example assumes 'referenceImage' is a pre-loaded Bitmap.
// In a real app, this might come from the user's device or a URL.
Bitmap referenceImage = null; // TODO("Load your image Bitmap here");

// Define the subject reference using the reference image.
ImagenControlReference controlReference = new ImagenControlReference.Builder()
        .setImage(referenceImage)
        .setReferenceID(1)
        .setControlType(CONTROL_TYPE_SCRIBBLE)
        .build();

// Provide a prompt that describes the final image.
// The "[1]" links the prompt to the subject reference with ID 1.
String prompt = "A cat flying through outer space arranged like the space scribble[1]";

// Define the editing configuration.
ImagenEditingConfig imagenEditingConfig = new ImagenEditingConfig.Builder()
        .setEditSteps(50) // Number of editing steps, a higher value can improve quality
        .build();

// Use the editImage API to perform the controlled customization.
// Pass the list of references, the prompt, and an editing configuration.
Futures.addCallback(model.editImage(Collections.singletonList(controlReference), prompt, imagenEditingConfig), new FutureCallback<ImagenGenerationResponse>() {
    @Override
    public void onSuccess(ImagenGenerationResponse result) {
        if (result.getImages().isEmpty()) {
            Log.d("TAG", "No images generated");
        }
        Bitmap bitmap = ((ImagenInlineImage) result.getImages().get(0)).asBitmap();
        // Use the bitmap to display the image in your UI
    }

    @Override
    public void onFailure(Throwable t) {
        // ...
    }
}, Executors.newSingleThreadExecutor());

Web

Imagen モデルを使用した画像編集は、ウェブアプリではサポートされていません。今年中にリリース予定です。

Dart

import 'dart:typed_data';
import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

// Initialize FirebaseApp
await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Vertex AI Gemini API backend service
// Optionally specify a location to access the model (for example, `us-central1`)
final ai = FirebaseAI.vertexAI(location: 'us-central1');

// Create an `ImagenModel` instance with an Imagen "capability" model
final model = ai.imagenModel(model: 'imagen-3.0-capability-001');

// This example assumes 'referenceImage' is a pre-loaded Uint8List.
// In a real app, this might come from the user's device or a URL.
final Uint8List referenceImage = Uint8List(0); // TODO: Load your reference image data here

// Define the control reference using the reference image.
final controlReference = ImagenControlReference(
  image: referenceImage,
  referenceId: 1,
    controlType: ImagenControlType.scribble,
);

// Provide a prompt that describes the final image.
// The "[1]" links the prompt to the subject reference with ID 1.
final prompt = "A cat flying through outer space arranged like the space scribble[1]";

try {
  // Use the editImage API to perform the controlled customization.
  // Pass the list of references, the prompt, and an editing configuration.
  final response = await model.editImage(
    [controlReference],
    prompt,
    config: ImagenEditingConfig(
      editSteps: 50, // Number of editing steps, a higher value can improve quality
    ),
  );

  // Process the result.
  if (response.images.isNotEmpty) {
    final editedImage = response.images.first.bytes;
    // Use the editedImage (a Uint8List) to display the image, save it, etc.
    print('Image successfully generated!');
  } else {
    // Handle the case where no images were generated.
    print('Error: No images were generated.');
  }
} catch (e) {
  // Handle any potential errors during the API call.
  print('An error occurred: $e');
}

Unity

Unity では、Imagen モデルを使用した画像編集はサポートされていません。今年中にリリース予定です。

プロンプトテンプレート

リクエストでは、画像参照 ID を指定する ImagenControlReference を定義して、参照画像（最大 4 枚）を指定します。複数の画像に同じ参照 ID を設定できます（たとえば、同じアイデアの複数のメモなど）。

プロンプトを作成するときに、これらの ID を参照します。たとえば、プロンプトで [1] を使用して、参照 ID 1 の画像を参照します。

次の表に、コントロールに基づくカスタマイズのプロンプトを作成する際の出発点として使用できるプロンプトテンプレートを示します。

ユースケース	参照画像	プロンプトテンプレート	例
制御されたカスタマイズ	フリーハンドマップ（1）	`scribble map [1]` に沿って画像を生成してください。説明は次のとおりです。「${STYLE_PROMPT} ${PROMPT}」	`scribble map [1]` に沿って画像を生成してください。説明は次のとおりです。「画像は、ゆったりとした筆致の印象派の油絵のスタイルにする必要があります。」自然光が差し込む雰囲気で、筆の跡が目立ちます。車の側面図。車は濡れて反射する路面に駐車しており、水たまりに街の光が反射している。
制御されたカスタマイズ	Canny 制御画像（1）	`edge map [1]` に沿って画像を生成してください。説明は次のとおりです。「${STYLE_PROMPT} ${PROMPT}」	`edge map [1]` に沿って画像を生成してください。説明は次のとおりです。「この画像は、印象派の油絵のスタイルで、ゆったりとした筆遣いで描かれている。」自然光が差し込む雰囲気で、筆の跡が目立ちます。車の側面図。車は濡れて反射する路面に駐車しており、水たまりに街の光が反射している。
FaceMesh 入力による人物画像のスタイル化	被写体画像（1 ～ 3） FaceMesh 制御画像（1）	`SUBJECT_DESCRIPTION [1]` に関する画像を `CONTROL_IMAGE [2]` のポーズで説明に合うように作成してください。「`SUBJECT_DESCRIPTION [1]` のポートレイト。${PROMPT}。」	`a woman with short hair [1]` に関する画像を `control image [2]` のポーズで説明に合うように作成してください。「背景がぼやけた 3D アニメスタイルの `a woman with short hair [1]` のポートレート。かわいらしくて愛らしいキャラクター、笑顔、カメラ目線、パステルカラーのトーン ...
FaceMesh 入力による人物画像のスタイル化	被写体画像（1 ～ 3） FaceMesh 制御画像（1）	`SUBJECT_DESCRIPTION [1]` に関する ${STYLE_PROMPT} 画像を `CONTROL_IMAGE [2]` のポーズで説明に合うように作成してください。「`SUBJECT_DESCRIPTION [1]` のポートレイト。${PROMPT}。」	`a woman with short hair [1]` に関する 3D アニメスタイルの画像を `control image [2]` のポーズで説明に合うように作成してください。「背景がぼやけた 3D アニメスタイルの `a woman with short hair [1]` のポートレート。かわいらしくて愛らしいキャラクター、笑顔、カメラ目線、パステルカラーのトーン ...

ベストプラクティスと制限事項

ユースケース

カスタマイズ機能では、フリースタイルプロンプトを使用できます。これにより、モデルがトレーニングされた以上のことができるという印象を与える可能性があります。以降のセクションでは、カスタマイズの想定されるユースケースと、想定外のユースケースの例について説明します。

この機能は、想定されたユースケースで使用することをおすすめします。これらのユースケースでモデルをトレーニングしており、優れた結果が得られることが期待されるためです。逆に、想定したユースケース以外のことをモデルに実行させようとしても、良い結果は期待できません。

想定されるユースケース

以下は、想定されるユースケースです。コントロールに基づくカスタマイズを想定しています。

プロンプトと Canny エッジ制御画像に従った画像を生成します。
プロンプトと落書き画像に沿った画像を生成します。
顔の表情を保持しながら人物の写真をスタイル化する。

想定外のユースケースの例

以下に、コントロールに基づくカスタマイズの想定外のユースケースの例をいくつか示します。このモデルはこれらのユースケース用にトレーニングされていないため、結果が不十分になる可能性があります。

プロンプトで指定されたスタイルを使用して画像を生成します。
参照画像で指定された特定のスタイルに沿ってテキストから画像を生成し、制御画像を使用して画像構成をある程度制御する。
参照画像で指定された特定のスタイルに沿ってテキストから画像を生成し、コントロールスケッチを使用して画像構図をある程度制御する。
参照画像で指定された特定のスタイルに沿ってテキストから画像を生成し、制御画像を使用して画像構図をある程度制御する。画像に写っている人物が特定の表情をしている。
2 人以上の人物の写真をスタイル化し、その人物の表情を保持する。
ペットの写真をスタイル化して絵画風に変換する。画像の構成（水彩など）を保持または指定します。