The latest Gemini models, like Gemini 3.5 Flash, are available to use with Firebase AI Logic! Learn more.

Gemini 2.0 Flash and Flash-Lite models were shut down on June 1, 2026. To avoid service disruption, update to a newer model like gemini-3.1-flash-lite. Learn more.

All Imagen models will shut down on June 24, 2026. Learn about migrating your apps to use Nano Banana.

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Phân tích tệp video bằng API Gemini

Bạn có thể yêu cầu mô hình Gemini phân tích các tệp video mà bạn cung cấp cùng dòng (được mã hoá base64) hoặc qua URL. Khi sử dụng Firebase AI Logic, bạn có thể đưa ra yêu cầu này trực tiếp từ ứng dụng của mình.

Với tính năng này, bạn có thể làm những việc như:

Tạo chú thích và trả lời câu hỏi về video
Phân tích các phân đoạn cụ thể của video bằng dấu thời gian
Chuyển nội dung video thành văn bản bằng cách xử lý cả bản âm thanh và khung hình ảnh
Mô tả, phân đoạn và trích xuất thông tin từ video, bao gồm cả bản âm thanh và khung hình ảnh

Chuyển đến mã mẫu Chuyển đến mã cho các câu trả lời được truyền trực tuyến

Xem các hướng dẫn khác để biết thêm lựa chọn về cách xử lý video
Tạo dữ liệu đầu ra có cấu trúc Trò chuyện nhiều lượt

Trước khi bắt đầu

Nhấp vào nhà cung định Gemini API để xem nội dung dành riêng cho nhà cung cấp và mã trên trang này.

Nếu bạn chưa làm, hãy hoàn tất hướng dẫn bắt đầu. Hướng dẫn này mô tả cách thiết lập dự án Firebase, kết nối ứng dụng với Firebase, thêm SDK, khởi chạy dịch vụ phụ trợ cho nhà cung cấp Gemini API mà bạn chọn và tạo thực thể GenerativeModel.

Để kiểm thử và lặp lại các câu lệnh, bạn nên sử dụng Google AI Studio.

Bạn cần một tệp video mẫu?

Bạn có thể sử dụng tệp có sẵn công khai này với loại MIME là video/mp4 (xem hoặc tải tệp xuống). https://storage.googleapis.com/cloud-samples-data/video/animals.mp4

Tạo văn bản từ tệp video (được mã hoá base64)

Trước khi thử mẫu này, hãy hoàn tất phần Trước khi bắt đầu của hướng dẫn này để thiết lập dự án và ứng dụng.
Trong phần đó, bạn cũng sẽ nhấp vào một nút cho nhà cung cấp Gemini API mà bạn chọn để xem nội dung dành riêng cho nhà cung cấp trên trang này.

Bạn có thể yêu cầu mô hình Gemini để tạo văn bản bằng cách đưa ra câu lệnh bằng văn bản và video – cung cấp mimeType của từng tệp đầu vào và chính tệp đó. Tìm các yêu cầu và đề xuất cho tệp đầu vào ở phần sau trên trang này.

Xin lưu ý rằng ví dụ này cho thấy việc cung cấp tệp cùng dòng, nhưng các SDK cũng hỗ trợ việc cung cấp URL trên YouTube.

Swift

Bạn có thể gọi generateContent() để tạo văn bản từ dữ liệu đầu vào đa phương thức gồm văn bản và tệp video.


import FirebaseAILogic

// Initialize the Gemini Developer API backend service.
let ai = FirebaseAI.firebaseAI(backend: .googleAI())

// Create a `GenerativeModel` instance with a model that supports your use case.
let model = ai.generativeModel(modelName: "gemini-3.5-flash")


// Provide the video as `Data` with the appropriate MIME type.
let video = InlineDataPart(data: try Data(contentsOf: videoURL), mimeType: "video/mp4")

// Provide a text prompt to include with the video
let prompt = "What is in the video?"

// To generate text output, call generateContent with the text and video
let response = try await model.generateContent(video, prompt)
print(response.text ?? "No text in response.")

Kotlin

Bạn có thể gọi generateContent() để tạo văn bản từ dữ liệu đầu vào đa phương thức gồm văn bản và tệp video.

^{Đối với Kotlin, các phương thức trong SDK này là các hàm tạm ngưng và cần được gọi
từ phạm vi Coroutine.}


// Initialize the Gemini Developer API backend service.
// Create a `GenerativeModel` instance with a model that supports your use case.
val model = Firebase.ai(backend = GenerativeBackend.googleAI())
                        .generativeModel("gemini-3.5-flash")


val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(videoUri).use { stream ->
  stream?.let {
    val bytes = stream.readBytes()

    // Provide a prompt that includes the video specified above and text
    val prompt = content {
        inlineData(bytes, "video/mp4")
        text("What is in the video?")
    }

    // To generate text output, call generateContent with the prompt
    val response = model.generateContent(prompt)
    Log.d(TAG, response.text ?: "")
  }
}

Java

Bạn có thể gọi generateContent() để tạo văn bản từ dữ liệu đầu vào đa phương thức gồm văn bản và tệp video.

^{Đối với Java, các phương thức trong SDK này trả về một
ListenableFuture.}


// Initialize the Gemini Developer API backend service.
// Create a `GenerativeModel` instance with a model that supports your use case.
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI())
        .generativeModel("gemini-3.5-flash");

// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures model = GenerativeModelFutures.from(ai);


ContentResolver resolver = getApplicationContext().getContentResolver();
try (InputStream stream = resolver.openInputStream(videoUri)) {
    File videoFile = new File(new URI(videoUri.toString()));
    int videoSize = (int) videoFile.length();
    byte[] videoBytes = new byte[videoSize];
    if (stream != null) {
        stream.read(videoBytes, 0, videoBytes.length);
        stream.close();

        // Provide a prompt that includes the video specified above and text
        Content prompt = new Content.Builder()
                .addInlineData(videoBytes, "video/mp4")
                .addText("What is in the video?")
                .build();

        // To generate text output, call generateContent with the prompt
        ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt);
        Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
            @Override
            public void onSuccess(GenerateContentResponse result) {
                String resultText = result.getText();
                System.out.println(resultText);
            }

            @Override
            public void onFailure(Throwable t) {
                t.printStackTrace();
            }
        }, executor);
    }
} catch (IOException e) {
    e.printStackTrace();
} catch (URISyntaxException e) {
    e.printStackTrace();
}

Web

Bạn có thể gọi generateContent() để tạo văn bản từ dữ liệu đầu vào đa phương thức gồm văn bản và tệp video.


import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service.
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance with a model that supports your use case.
const model = getGenerativeModel(ai, { model: "gemini-3.5-flash" });


// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the video
  const prompt = "What do you see?";

  const fileInputEl = document.querySelector("input[type=file]");
  const videoPart = await fileToGenerativePart(fileInputEl.files[0]);

  // To generate text output, call generateContent with the text and video
  const result = await model.generateContent([prompt, videoPart]);

  const response = result.response;
  const text = response.text();
  console.log(text);
}

run();

Dart

Bạn có thể gọi generateContent() để tạo văn bản từ dữ liệu đầu vào đa phương thức gồm văn bản và tệp video.


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

// Initialize FirebaseApp
await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service.
// Create a `GenerativeModel` instance with a model that supports your use case.
final model =
      FirebaseAI.googleAI().generativeModel(model: 'gemini-3.5-flash');


// Provide a text prompt to include with the video
final prompt = TextPart("What's in the video?");

// Prepare video for input
final video = await File('video0.mp4').readAsBytes();

// Provide the video as `Data` with the appropriate mimetype
final videoPart = InlineDataPart('video/mp4', video);

// To generate text output, call generateContent with the text and images
final response = await model.generateContent([
  Content.multi([prompt, ...videoPart])
]);
print(response.text);

Unity

Bạn có thể gọi GenerateContentAsync() để tạo văn bản từ dữ liệu đầu vào đa phương thức gồm văn bản và tệp video.


using Firebase;
using Firebase.AI;

// Initialize the Gemini Developer API backend service.
var ai = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());

// Create a `GenerativeModel` instance with a model that supports your use case.
var model = ai.GetGenerativeModel(modelName: "gemini-3.5-flash");


// Provide the video as `data` with the appropriate MIME type.
var video = ModelContent.InlineData("video/mp4",
      System.IO.File.ReadAllBytes(System.IO.Path.Combine(
          UnityEngine.Application.streamingAssetsPath, "yourVideo.mp4")));

// Provide a text prompt to include with the video
var prompt = ModelContent.Text("What is in the video?");

// To generate text output, call GenerateContentAsync with the text and video
var response = await model.GenerateContentAsync(new [] { video, prompt });
UnityEngine.Debug.Log(response.Text ?? "No text in response.");

Tìm hiểu cách chọn một mô hình phù hợp với trường hợp sử dụng và ứng dụng của bạn.

Truyền trực tuyến câu trả lời

Bạn có thể tương tác nhanh hơn bằng cách không đợi toàn bộ kết quả từ quá trình tạo mô hình mà thay vào đó sử dụng tính năng truyền trực tuyến để xử lý kết quả từng phần. Để truyền trực tuyến câu trả lời, hãy gọi generateContentStream.