在 Android 上使用 ML Kit 偵測和追蹤對象

您可以使用 ML Kit 跨視訊幀檢測和追蹤物件。

當您傳遞 ML Kit 映像時,ML Kit 會為每個影像傳回最多五個偵測到的物件及其在影像中的位置的清單。偵測視訊串流中的物件時,每個物件都有一個 ID,可用於跨影像追蹤物件。您也可以選擇啟用粗略物件分類,這會使用廣泛的類別描述來標記物件。

在你開始之前

  1. 如果您尚未將 Firebase 新增至您的 Android 專案中,請將其新增至您的 Android 專案中。
  2. 將 ML Kit Android 函式庫的依賴項新增至模組(應用程式層級)Gradle 檔案(通常app/build.gradle ):
    apply plugin: 'com.android.application'
    apply plugin: 'com.google.gms.google-services'
    
    dependencies {
      // ...
    
      implementation 'com.google.firebase:firebase-ml-vision:24.0.3'
      implementation 'com.google.firebase:firebase-ml-vision-object-detection-model:19.0.6'
    }
    

1. 配置物體偵測器

若要開始偵測和追蹤對象,請先建立FirebaseVisionObjectDetector的實例,可以選擇指定要變更預設設定的任何偵測器設定。

  1. 使用FirebaseVisionObjectDetectorOptions物件為您的用例配置物件偵測器。您可以更改以下設定:

    物體偵測器設定
    檢測方式STREAM_MODE (預設)| SINGLE_IMAGE_MODE

    STREAM_MODE (預設)下,物件偵測器以低延遲運行,但在偵測器的前幾次呼叫時可能會產生不完整的結果(例如未指定的邊界框或類別標籤)。此外,在STREAM_MODE中,偵測器為物件分配追蹤 ID,您可以使用它來跨幀追蹤物件。當您想要追蹤物件或低延遲很重要時(例如即時處理視訊串流時),請使用此模式。

    SINGLE_IMAGE_MODE中,物件偵測器會等待,直到偵測到的物件的邊界框和(如果啟用了分類)類別標籤可用,然後再傳回結果。因此,檢測延遲可能會更高。此外,在SINGLE_IMAGE_MODE中,不會分配追蹤 ID。如果延遲並不重要且您不想處理部分結果,請使用此模式。

    偵測並追蹤多個物體false (預設)| true

    是否檢測和追蹤最多五個物件或僅檢測和追蹤最突出的物件(預設)。

    將物體分類false (預設)| true

    是否將偵測到的物件分類為粗略類別。啟用後,物件偵測器會將物件分為以下類別:時尚商品、食品、家居用品、地點、植物和未知。

    物件偵測和追蹤 API 針對這兩個核心用例進行了最佳化:

    • 即時偵測並追蹤相機取景器中最突出的物體
    • 從靜態影像中偵測多個對象

    若要為這些用例配置 API:

    Java

    // Live detection and tracking
    FirebaseVisionObjectDetectorOptions options =
            new FirebaseVisionObjectDetectorOptions.Builder()
                    .setDetectorMode(FirebaseVisionObjectDetectorOptions.STREAM_MODE)
                    .enableClassification()  // Optional
                    .build();
    
    // Multiple object detection in static images
    FirebaseVisionObjectDetectorOptions options =
            new FirebaseVisionObjectDetectorOptions.Builder()
                    .setDetectorMode(FirebaseVisionObjectDetectorOptions.SINGLE_IMAGE_MODE)
                    .enableMultipleObjects()
                    .enableClassification()  // Optional
                    .build();
    

    Kotlin+KTX

    // Live detection and tracking
    val options = FirebaseVisionObjectDetectorOptions.Builder()
            .setDetectorMode(FirebaseVisionObjectDetectorOptions.STREAM_MODE)
            .enableClassification()  // Optional
            .build()
    
    // Multiple object detection in static images
    val options = FirebaseVisionObjectDetectorOptions.Builder()
            .setDetectorMode(FirebaseVisionObjectDetectorOptions.SINGLE_IMAGE_MODE)
            .enableMultipleObjects()
            .enableClassification()  // Optional
            .build()
    
  2. 取得FirebaseVisionObjectDetector的實例:

    Java

    FirebaseVisionObjectDetector objectDetector =
            FirebaseVision.getInstance().getOnDeviceObjectDetector();
    
    // Or, to change the default settings:
    FirebaseVisionObjectDetector objectDetector =
            FirebaseVision.getInstance().getOnDeviceObjectDetector(options);
    

    Kotlin+KTX

    val objectDetector = FirebaseVision.getInstance().getOnDeviceObjectDetector()
    
    // Or, to change the default settings:
    val objectDetector = FirebaseVision.getInstance().getOnDeviceObjectDetector(options)
    

2. 運行物體偵測器

若要偵測和追蹤對象,請將映像傳遞給FirebaseVisionObjectDetector實例的processImage()方法。

對於序列中的每一幀視訊或影像,請執行以下操作:

  1. 從您的映像建立FirebaseVisionImage物件。

    • 若要從media.Image物件建立FirebaseVisionImage物件(例如從裝置的相機擷取影像時),請將media.Image物件和影像的旋轉傳遞給FirebaseVisionImage.fromMediaImage()

      如果您使用CameraX函式庫,則OnImageCapturedListenerImageAnalysis.Analyzer類別會為您計算旋轉值,因此您只需在呼叫FirebaseVisionImage.fromMediaImage()之前將旋轉轉換為 ML Kit 的ROTATION_常數之一:

      Java

      private class YourAnalyzer implements ImageAnalysis.Analyzer {
      
          private int degreesToFirebaseRotation(int degrees) {
              switch (degrees) {
                  case 0:
                      return FirebaseVisionImageMetadata.ROTATION_0;
                  case 90:
                      return FirebaseVisionImageMetadata.ROTATION_90;
                  case 180:
                      return FirebaseVisionImageMetadata.ROTATION_180;
                  case 270:
                      return FirebaseVisionImageMetadata.ROTATION_270;
                  default:
                      throw new IllegalArgumentException(
                              "Rotation must be 0, 90, 180, or 270.");
              }
          }
      
          @Override
          public void analyze(ImageProxy imageProxy, int degrees) {
              if (imageProxy == null || imageProxy.getImage() == null) {
                  return;
              }
              Image mediaImage = imageProxy.getImage();
              int rotation = degreesToFirebaseRotation(degrees);
              FirebaseVisionImage image =
                      FirebaseVisionImage.fromMediaImage(mediaImage, rotation);
              // Pass image to an ML Kit Vision API
              // ...
          }
      }
      

      Kotlin+KTX

      private class YourImageAnalyzer : ImageAnalysis.Analyzer {
          private fun degreesToFirebaseRotation(degrees: Int): Int = when(degrees) {
              0 -> FirebaseVisionImageMetadata.ROTATION_0
              90 -> FirebaseVisionImageMetadata.ROTATION_90
              180 -> FirebaseVisionImageMetadata.ROTATION_180
              270 -> FirebaseVisionImageMetadata.ROTATION_270
              else -> throw Exception("Rotation must be 0, 90, 180, or 270.")
          }
      
          override fun analyze(imageProxy: ImageProxy?, degrees: Int) {
              val mediaImage = imageProxy?.image
              val imageRotation = degreesToFirebaseRotation(degrees)
              if (mediaImage != null) {
                  val image = FirebaseVisionImage.fromMediaImage(mediaImage, imageRotation)
                  // Pass image to an ML Kit Vision API
                  // ...
              }
          }
      }
      

      如果您不使用為您提供影像旋轉的相機庫,您可以根據裝置的旋轉和裝置中相機感測器的方向來計算它:

      Java

      private static final SparseIntArray ORIENTATIONS = new SparseIntArray();
      static {
          ORIENTATIONS.append(Surface.ROTATION_0, 90);
          ORIENTATIONS.append(Surface.ROTATION_90, 0);
          ORIENTATIONS.append(Surface.ROTATION_180, 270);
          ORIENTATIONS.append(Surface.ROTATION_270, 180);
      }
      
      /**
       * Get the angle by which an image must be rotated given the device's current
       * orientation.
       */
      @RequiresApi(api = Build.VERSION_CODES.LOLLIPOP)
      private int getRotationCompensation(String cameraId, Activity activity, Context context)
              throws CameraAccessException {
          // Get the device's current rotation relative to its "native" orientation.
          // Then, from the ORIENTATIONS table, look up the angle the image must be
          // rotated to compensate for the device's rotation.
          int deviceRotation = activity.getWindowManager().getDefaultDisplay().getRotation();
          int rotationCompensation = ORIENTATIONS.get(deviceRotation);
      
          // On most devices, the sensor orientation is 90 degrees, but for some
          // devices it is 270 degrees. For devices with a sensor orientation of
          // 270, rotate the image an additional 180 ((270 + 270) % 360) degrees.
          CameraManager cameraManager = (CameraManager) context.getSystemService(CAMERA_SERVICE);
          int sensorOrientation = cameraManager
                  .getCameraCharacteristics(cameraId)
                  .get(CameraCharacteristics.SENSOR_ORIENTATION);
          rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360;
      
          // Return the corresponding FirebaseVisionImageMetadata rotation value.
          int result;
          switch (rotationCompensation) {
              case 0:
                  result = FirebaseVisionImageMetadata.ROTATION_0;
                  break;
              case 90:
                  result = FirebaseVisionImageMetadata.ROTATION_90;
                  break;
              case 180:
                  result = FirebaseVisionImageMetadata.ROTATION_180;
                  break;
              case 270:
                  result = FirebaseVisionImageMetadata.ROTATION_270;
                  break;
              default:
                  result = FirebaseVisionImageMetadata.ROTATION_0;
                  Log.e(TAG, "Bad rotation value: " + rotationCompensation);
          }
          return result;
      }

      Kotlin+KTX

      private val ORIENTATIONS = SparseIntArray()
      
      init {
          ORIENTATIONS.append(Surface.ROTATION_0, 90)
          ORIENTATIONS.append(Surface.ROTATION_90, 0)
          ORIENTATIONS.append(Surface.ROTATION_180, 270)
          ORIENTATIONS.append(Surface.ROTATION_270, 180)
      }
      /**
       * Get the angle by which an image must be rotated given the device's current
       * orientation.
       */
      @RequiresApi(api = Build.VERSION_CODES.LOLLIPOP)
      @Throws(CameraAccessException::class)
      private fun getRotationCompensation(cameraId: String, activity: Activity, context: Context): Int {
          // Get the device's current rotation relative to its "native" orientation.
          // Then, from the ORIENTATIONS table, look up the angle the image must be
          // rotated to compensate for the device's rotation.
          val deviceRotation = activity.windowManager.defaultDisplay.rotation
          var rotationCompensation = ORIENTATIONS.get(deviceRotation)
      
          // On most devices, the sensor orientation is 90 degrees, but for some
          // devices it is 270 degrees. For devices with a sensor orientation of
          // 270, rotate the image an additional 180 ((270 + 270) % 360) degrees.
          val cameraManager = context.getSystemService(CAMERA_SERVICE) as CameraManager
          val sensorOrientation = cameraManager
                  .getCameraCharacteristics(cameraId)
                  .get(CameraCharacteristics.SENSOR_ORIENTATION)!!
          rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360
      
          // Return the corresponding FirebaseVisionImageMetadata rotation value.
          val result: Int
          when (rotationCompensation) {
              0 -> result = FirebaseVisionImageMetadata.ROTATION_0
              90 -> result = FirebaseVisionImageMetadata.ROTATION_90
              180 -> result = FirebaseVisionImageMetadata.ROTATION_180
              270 -> result = FirebaseVisionImageMetadata.ROTATION_270
              else -> {
                  result = FirebaseVisionImageMetadata.ROTATION_0
                  Log.e(TAG, "Bad rotation value: $rotationCompensation")
              }
          }
          return result
      }

      然後,將media.Image物件和旋轉值傳遞給FirebaseVisionImage.fromMediaImage()

      Java

      FirebaseVisionImage image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation);

      Kotlin+KTX

      val image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation)
    • 若要從檔案 URI 建立FirebaseVisionImage對象,請將套用上下文和檔案 URI 傳遞給FirebaseVisionImage.fromFilePath() 。當您使用ACTION_GET_CONTENT意圖提示使用者從其圖庫應用程式中選擇影像時,這非常有用。

      Java

      FirebaseVisionImage image;
      try {
          image = FirebaseVisionImage.fromFilePath(context, uri);
      } catch (IOException e) {
          e.printStackTrace();
      }

      Kotlin+KTX

      val image: FirebaseVisionImage
      try {
          image = FirebaseVisionImage.fromFilePath(context, uri)
      } catch (e: IOException) {
          e.printStackTrace()
      }
    • 若要從ByteBuffer或位元組數組建立FirebaseVisionImage對象,請先按照上面針對media.Image輸入所述計算圖像旋轉。

      然後,建立一個FirebaseVisionImageMetadata對象,其中包含圖像的高度、寬度、顏色編碼格式和旋轉:

      Java

      FirebaseVisionImageMetadata metadata = new FirebaseVisionImageMetadata.Builder()
              .setWidth(480)   // 480x360 is typically sufficient for
              .setHeight(360)  // image recognition
              .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21)
              .setRotation(rotation)
              .build();

      Kotlin+KTX

      val metadata = FirebaseVisionImageMetadata.Builder()
              .setWidth(480) // 480x360 is typically sufficient for
              .setHeight(360) // image recognition
              .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21)
              .setRotation(rotation)
              .build()

      使用緩衝區或陣列以及元資料物件來建立FirebaseVisionImage物件:

      Java

      FirebaseVisionImage image = FirebaseVisionImage.fromByteBuffer(buffer, metadata);
      // Or: FirebaseVisionImage image = FirebaseVisionImage.fromByteArray(byteArray, metadata);

      Kotlin+KTX

      val image = FirebaseVisionImage.fromByteBuffer(buffer, metadata)
      // Or: val image = FirebaseVisionImage.fromByteArray(byteArray, metadata)
    • 要從Bitmap物件建立FirebaseVisionImage物件:

      Java

      FirebaseVisionImage image = FirebaseVisionImage.fromBitmap(bitmap);

      Kotlin+KTX

      val image = FirebaseVisionImage.fromBitmap(bitmap)
      Bitmap物件表示的影像必須是直立的,不需要額外旋轉。
  2. 將影像傳遞給processImage()方法:

    Java

    objectDetector.processImage(image)
            .addOnSuccessListener(
                    new OnSuccessListener<List<FirebaseVisionObject>>() {
                        @Override
                        public void onSuccess(List<FirebaseVisionObject> detectedObjects) {
                            // Task completed successfully
                            // ...
                        }
                    })
            .addOnFailureListener(
                    new OnFailureListener() {
                        @Override
                        public void onFailure(@NonNull Exception e) {
                            // Task failed with an exception
                            // ...
                        }
                    });
    

    Kotlin+KTX

    objectDetector.processImage(image)
            .addOnSuccessListener { detectedObjects ->
                // Task completed successfully
                // ...
            }
            .addOnFailureListener { e ->
                // Task failed with an exception
                // ...
            }
    
  3. 如果對processImage()呼叫成功,則FirebaseVisionObject的清單將傳遞到成功偵聽器。

    每個FirebaseVisionObject包含以下屬性:

    邊界框指示影像中物件位置的Rect
    追蹤號碼一個整數,用於識別影像中的物件。 SINGLE_IMAGE_MODE 下為空。
    類別物件的粗略類別。如果物件偵測器未啟用分類,則始終為FirebaseVisionObject.CATEGORY_UNKNOWN
    信心物件分類的置信度值。如果物件偵測器未啟用分類,或物件被分類為未知,則此值為null

    Java

    // The list of detected objects contains one item if multiple object detection wasn't enabled.
    for (FirebaseVisionObject obj : detectedObjects) {
        Integer id = obj.getTrackingId();
        Rect bounds = obj.getBoundingBox();
    
        // If classification was enabled:
        int category = obj.getClassificationCategory();
        Float confidence = obj.getClassificationConfidence();
    }
    

    Kotlin+KTX

    // The list of detected objects contains one item if multiple object detection wasn't enabled.
    for (obj in detectedObjects) {
        val id = obj.trackingId       // A number that identifies the object across images
        val bounds = obj.boundingBox  // The object's position in the image
    
        // If classification was enabled:
        val category = obj.classificationCategory
        val confidence = obj.classificationConfidence
    }
    

提高可用性和效能

為了獲得最佳用戶體驗,請在您的應用程式中遵循以下準則:

  • 成功的物件偵測取決於物件的視覺複雜性。具有少量視覺特徵的物件可能需要佔據影像的較大部分才能被偵測。您應該為使用者提供有關捕獲輸入的指導,該輸入與您想要檢測的物件類型配合良好。
  • 使用分類時,如果您想要偵測未完全屬於支援類別的對象,請對未知對象實作特殊處理。

另外,請查看 [ML Kit Material Design 展示應用][showcase-link]{: .external } 和機器學習驅動功能集合的 Material Design 模式

在即時應用程式中使用串流模式時,請遵循以下準則以獲得最佳幀速率:

  • 不要在串流模式下使用多個物件偵測,因為大多數裝置無法產生足夠的幀速率。

  • 如果不需要,請停用分類。

  • 對檢測器的節流呼叫。如果偵測器運作時有新的視訊幀可用,則丟棄該幀。
  • 如果您使用偵測器的輸出將圖形疊加在輸入影像上,請先從 ML Kit 取得結果,然後一步渲染影像並疊加。透過這樣做,每個輸入幀只需渲染到顯示表面一次。
  • 如果您使用 Camera2 API,請以ImageFormat.YUV_420_888格式擷取影像。

    如果您使用較舊的相機 API,請以ImageFormat.NV21格式擷取影像。