Catch up on everthing we announced at this year's Firebase Summit. Learn more

在 Android 上使用 ML Kit 檢測和跟踪對象

您可以使用 ML Kit 跨視頻幀檢測和跟踪對象。

當您傳遞 ML Kit 圖像時,ML Kit 會為每個圖像返回一個包含最多五個檢測到的對象及其在圖像中的位置的列表。在視頻流中檢測對象時,每個對像都有一個 ID,可用於跨圖像跟踪對象。您還可以選擇啟用粗略的對象分類,它使用廣泛的類別描述標記對象。

在你開始之前

  1. 如果你還沒有,添加火力地堡到您的Android項目
  2. 對於ML套件的Android庫的依賴關係添加到您的模塊(應用程序級)搖籃文件(通常是app/build.gradle ):
    apply plugin: 'com.android.application'
    apply plugin: 'com.google.gms.google-services'
    
    dependencies {
      // ...
    
      implementation 'com.google.firebase:firebase-ml-vision:24.0.3'
      implementation 'com.google.firebase:firebase-ml-vision-object-detection-model:19.0.6'
    }
    

1. 配置物體檢測器

要啟動檢測和跟踪物體,首先創建的實例FirebaseVisionObjectDetector ,可以選擇指定要更改默認的任何檢測器設置。

  1. 配置對象探測器你的使用情況與FirebaseVisionObjectDetectorOptions對象。您可以更改以下設置:

    物體檢測器設置
    檢測方式STREAM_MODE (默認)| SINGLE_IMAGE_MODE

    STREAM_MODE (默認值),對象檢測器與低等待時間運行,但可能會產生對檢測器的前幾個調用不完整的結果(諸如未指定的邊界框或類別標籤)。此外,在STREAM_MODE ,探測器受讓人追踪編號的對象,你可以用它來跟踪幀對象。當您想要跟踪對像或低延遲很重要時(例如實時處理視頻流時),請使用此模式。

    SINGLE_IMAGE_MODE ,對象檢測等待,直到檢測到的物體的邊界框和(如果啟用了分類)類別標籤是返回結果之前可用。因此,檢測延遲可能更高。此外,在SINGLE_IMAGE_MODE ,跟踪ID不分配。如果延遲不重要並且您不想處理部分結果,請使用此模式。

    檢測和跟踪多個對象false (默認值)| true

    是檢測和跟踪最多五個對像還是僅檢測和跟踪最突出的對象(默認)。

    分類對象false (默認值)| true

    是否將檢測到的對象分類為粗類別。啟用後,對象檢測器將對象分為以下幾類:時尚商品、食品、家居用品、地點、植物和未知。

    對象檢測和跟踪 API 針對以下兩個核心用例進行了優化:

    • 實時檢測和跟踪相機取景器中最突出的物體
    • 從靜態圖像中檢測多個對象

    要為這些用例配置 API:

    爪哇

    // Live detection and tracking
    FirebaseVisionObjectDetectorOptions options =
            new FirebaseVisionObjectDetectorOptions.Builder()
                    .setDetectorMode(FirebaseVisionObjectDetectorOptions.STREAM_MODE)
                    .enableClassification()  // Optional
                    .build();
    
    // Multiple object detection in static images
    FirebaseVisionObjectDetectorOptions options =
            new FirebaseVisionObjectDetectorOptions.Builder()
                    .setDetectorMode(FirebaseVisionObjectDetectorOptions.SINGLE_IMAGE_MODE)
                    .enableMultipleObjects()
                    .enableClassification()  // Optional
                    .build();
    

    科特林+KTX

    // Live detection and tracking
    val options = FirebaseVisionObjectDetectorOptions.Builder()
            .setDetectorMode(FirebaseVisionObjectDetectorOptions.STREAM_MODE)
            .enableClassification()  // Optional
            .build()
    
    // Multiple object detection in static images
    val options = FirebaseVisionObjectDetectorOptions.Builder()
            .setDetectorMode(FirebaseVisionObjectDetectorOptions.SINGLE_IMAGE_MODE)
            .enableMultipleObjects()
            .enableClassification()  // Optional
            .build()
    
  2. 獲取的實例FirebaseVisionObjectDetector

    爪哇

    FirebaseVisionObjectDetector objectDetector =
            FirebaseVision.getInstance().getOnDeviceObjectDetector();
    
    // Or, to change the default settings:
    FirebaseVisionObjectDetector objectDetector =
            FirebaseVision.getInstance().getOnDeviceObjectDetector(options);
    

    科特林+KTX

    val objectDetector = FirebaseVision.getInstance().getOnDeviceObjectDetector()
    
    // Or, to change the default settings:
    val objectDetector = FirebaseVision.getInstance().getOnDeviceObjectDetector(options)
    

2. 運行物體檢測器

探測和跟踪的目的,通過圖像到FirebaseVisionObjectDetector實例的processImage()方法。

對於序列中的每一幀視頻或圖像,請執行以下操作:

  1. 創建FirebaseVisionImage從圖像中的對象。

    • 要創建FirebaseVisionImage從對象media.Image對象,例如從設備的照相機捕獲圖像時一樣,通過media.Image對象和圖像的旋轉FirebaseVisionImage.fromMediaImage()

      如果使用CameraX圖書館, OnImageCapturedListenerImageAnalysis.Analyzer類為你計算旋轉值,所以你只需要旋轉轉換成ML套件的一個ROTATION_調用之前常數FirebaseVisionImage.fromMediaImage()

      爪哇

      private class YourAnalyzer implements ImageAnalysis.Analyzer {
      
          private int degreesToFirebaseRotation(int degrees) {
              switch (degrees) {
                  case 0:
                      return FirebaseVisionImageMetadata.ROTATION_0;
                  case 90:
                      return FirebaseVisionImageMetadata.ROTATION_90;
                  case 180:
                      return FirebaseVisionImageMetadata.ROTATION_180;
                  case 270:
                      return FirebaseVisionImageMetadata.ROTATION_270;
                  default:
                      throw new IllegalArgumentException(
                              "Rotation must be 0, 90, 180, or 270.");
              }
          }
      
          @Override
          public void analyze(ImageProxy imageProxy, int degrees) {
              if (imageProxy == null || imageProxy.getImage() == null) {
                  return;
              }
              Image mediaImage = imageProxy.getImage();
              int rotation = degreesToFirebaseRotation(degrees);
              FirebaseVisionImage image =
                      FirebaseVisionImage.fromMediaImage(mediaImage, rotation);
              // Pass image to an ML Kit Vision API
              // ...
          }
      }
      

      科特林+KTX

      private class YourImageAnalyzer : ImageAnalysis.Analyzer {
          private fun degreesToFirebaseRotation(degrees: Int): Int = when(degrees) {
              0 -> FirebaseVisionImageMetadata.ROTATION_0
              90 -> FirebaseVisionImageMetadata.ROTATION_90
              180 -> FirebaseVisionImageMetadata.ROTATION_180
              270 -> FirebaseVisionImageMetadata.ROTATION_270
              else -> throw Exception("Rotation must be 0, 90, 180, or 270.")
          }
      
          override fun analyze(imageProxy: ImageProxy?, degrees: Int) {
              val mediaImage = imageProxy?.image
              val imageRotation = degreesToFirebaseRotation(degrees)
              if (mediaImage != null) {
                  val image = FirebaseVisionImage.fromMediaImage(mediaImage, imageRotation)
                  // Pass image to an ML Kit Vision API
                  // ...
              }
          }
      }
      

      如果您不使用為您提供圖像旋轉的相機庫,您可以根據設備的旋轉和設備中相機傳感器的方向來計算它:

      爪哇

      private static final SparseIntArray ORIENTATIONS = new SparseIntArray();
      static {
          ORIENTATIONS.append(Surface.ROTATION_0, 90);
          ORIENTATIONS.append(Surface.ROTATION_90, 0);
          ORIENTATIONS.append(Surface.ROTATION_180, 270);
          ORIENTATIONS.append(Surface.ROTATION_270, 180);
      }
      
      /**
       * Get the angle by which an image must be rotated given the device's current
       * orientation.
       */
      @RequiresApi(api = Build.VERSION_CODES.LOLLIPOP)
      private int getRotationCompensation(String cameraId, Activity activity, Context context)
              throws CameraAccessException {
          // Get the device's current rotation relative to its "native" orientation.
          // Then, from the ORIENTATIONS table, look up the angle the image must be
          // rotated to compensate for the device's rotation.
          int deviceRotation = activity.getWindowManager().getDefaultDisplay().getRotation();
          int rotationCompensation = ORIENTATIONS.get(deviceRotation);
      
          // On most devices, the sensor orientation is 90 degrees, but for some
          // devices it is 270 degrees. For devices with a sensor orientation of
          // 270, rotate the image an additional 180 ((270 + 270) % 360) degrees.
          CameraManager cameraManager = (CameraManager) context.getSystemService(CAMERA_SERVICE);
          int sensorOrientation = cameraManager
                  .getCameraCharacteristics(cameraId)
                  .get(CameraCharacteristics.SENSOR_ORIENTATION);
          rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360;
      
          // Return the corresponding FirebaseVisionImageMetadata rotation value.
          int result;
          switch (rotationCompensation) {
              case 0:
                  result = FirebaseVisionImageMetadata.ROTATION_0;
                  break;
              case 90:
                  result = FirebaseVisionImageMetadata.ROTATION_90;
                  break;
              case 180:
                  result = FirebaseVisionImageMetadata.ROTATION_180;
                  break;
              case 270:
                  result = FirebaseVisionImageMetadata.ROTATION_270;
                  break;
              default:
                  result = FirebaseVisionImageMetadata.ROTATION_0;
                  Log.e(TAG, "Bad rotation value: " + rotationCompensation);
          }
          return result;
      }

      科特林+KTX

      private val ORIENTATIONS = SparseIntArray()
      
      init {
          ORIENTATIONS.append(Surface.ROTATION_0, 90)
          ORIENTATIONS.append(Surface.ROTATION_90, 0)
          ORIENTATIONS.append(Surface.ROTATION_180, 270)
          ORIENTATIONS.append(Surface.ROTATION_270, 180)
      }
      /**
       * Get the angle by which an image must be rotated given the device's current
       * orientation.
       */
      @RequiresApi(api = Build.VERSION_CODES.LOLLIPOP)
      @Throws(CameraAccessException::class)
      private fun getRotationCompensation(cameraId: String, activity: Activity, context: Context): Int {
          // Get the device's current rotation relative to its "native" orientation.
          // Then, from the ORIENTATIONS table, look up the angle the image must be
          // rotated to compensate for the device's rotation.
          val deviceRotation = activity.windowManager.defaultDisplay.rotation
          var rotationCompensation = ORIENTATIONS.get(deviceRotation)
      
          // On most devices, the sensor orientation is 90 degrees, but for some
          // devices it is 270 degrees. For devices with a sensor orientation of
          // 270, rotate the image an additional 180 ((270 + 270) % 360) degrees.
          val cameraManager = context.getSystemService(CAMERA_SERVICE) as CameraManager
          val sensorOrientation = cameraManager
                  .getCameraCharacteristics(cameraId)
                  .get(CameraCharacteristics.SENSOR_ORIENTATION)!!
          rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360
      
          // Return the corresponding FirebaseVisionImageMetadata rotation value.
          val result: Int
          when (rotationCompensation) {
              0 -> result = FirebaseVisionImageMetadata.ROTATION_0
              90 -> result = FirebaseVisionImageMetadata.ROTATION_90
              180 -> result = FirebaseVisionImageMetadata.ROTATION_180
              270 -> result = FirebaseVisionImageMetadata.ROTATION_270
              else -> {
                  result = FirebaseVisionImageMetadata.ROTATION_0
                  Log.e(TAG, "Bad rotation value: $rotationCompensation")
              }
          }
          return result
      }

      然後,通過media.Image對象和旋轉值FirebaseVisionImage.fromMediaImage()

      爪哇

      FirebaseVisionImage image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation);

      科特林+KTX

      val image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation)
    • 要創建FirebaseVisionImage從文件URI對象,通過該應用上下文和文件的URI FirebaseVisionImage.fromFilePath()當您使用這是有用的ACTION_GET_CONTENT意圖,提示用戶選擇從他們自己的相冊應用程序的圖像。

      爪哇

      FirebaseVisionImage image;
      try {
          image = FirebaseVisionImage.fromFilePath(context, uri);
      } catch (IOException e) {
          e.printStackTrace();
      }

      科特林+KTX

      val image: FirebaseVisionImage
      try {
          image = FirebaseVisionImage.fromFilePath(context, uri)
      } catch (e: IOException) {
          e.printStackTrace()
      }
    • 要創建FirebaseVisionImage從一個對象ByteBuffer或一個字節數組,首先是作為上述用於計算所述圖像旋轉media.Image輸入。

      然後,創建一個FirebaseVisionImageMetadata包含圖像的高度,寬度對象,顏色編碼格式,和旋轉:

      爪哇

      FirebaseVisionImageMetadata metadata = new FirebaseVisionImageMetadata.Builder()
              .setWidth(480)   // 480x360 is typically sufficient for
              .setHeight(360)  // image recognition
              .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21)
              .setRotation(rotation)
              .build();

      科特林+KTX

      val metadata = FirebaseVisionImageMetadata.Builder()
              .setWidth(480) // 480x360 is typically sufficient for
              .setHeight(360) // image recognition
              .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21)
              .setRotation(rotation)
              .build()

      使用緩衝器或陣列,所述元數據對象,以創建FirebaseVisionImage對象:

      爪哇

      FirebaseVisionImage image = FirebaseVisionImage.fromByteBuffer(buffer, metadata);
      // Or: FirebaseVisionImage image = FirebaseVisionImage.fromByteArray(byteArray, metadata);

      科特林+KTX

      val image = FirebaseVisionImage.fromByteBuffer(buffer, metadata)
      // Or: val image = FirebaseVisionImage.fromByteArray(byteArray, metadata)
    • 要創建一個FirebaseVisionImage從對象Bitmap對象:

      爪哇

      FirebaseVisionImage image = FirebaseVisionImage.fromBitmap(bitmap);

      科特林+KTX

      val image = FirebaseVisionImage.fromBitmap(bitmap)
      由表示的圖像Bitmap對象必須是直立,而無需額外的旋轉。
  2. 圖像傳遞到processImage()方法:

    爪哇

    objectDetector.processImage(image)
            .addOnSuccessListener(
                    new OnSuccessListener<List<FirebaseVisionObject>>() {
                        @Override
                        public void onSuccess(List<FirebaseVisionObject> detectedObjects) {
                            // Task completed successfully
                            // ...
                        }
                    })
            .addOnFailureListener(
                    new OnFailureListener() {
                        @Override
                        public void onFailure(@NonNull Exception e) {
                            // Task failed with an exception
                            // ...
                        }
                    });
    

    科特林+KTX

    objectDetector.processImage(image)
            .addOnSuccessListener { detectedObjects ->
                // Task completed successfully
                // ...
            }
            .addOnFailureListener { e ->
                // Task failed with an exception
                // ...
            }
    
  3. 如果調用processImage()成功,列表FirebaseVisionObject s的傳遞到聽者的成功。

    每個FirebaseVisionObject包含以下屬性:

    邊界框Rect指示圖像中的對象的位置。
    跟踪號碼一個整數,用於標識跨圖像的對象。在 SINGLE_IMAGE_MODE 中為空。
    類別對象的粗略類別。如果對象檢測器沒有啟用分類,這始終是FirebaseVisionObject.CATEGORY_UNKNOWN
    信心對象分類的置信度值。如果目標物檢測沒有分類啟用,或將對象歸為未知,這就是null

    爪哇

    // The list of detected objects contains one item if multiple object detection wasn't enabled.
    for (FirebaseVisionObject obj : detectedObjects) {
        Integer id = obj.getTrackingId();
        Rect bounds = obj.getBoundingBox();
    
        // If classification was enabled:
        int category = obj.getClassificationCategory();
        Float confidence = obj.getClassificationConfidence();
    }
    

    科特林+KTX

    // The list of detected objects contains one item if multiple object detection wasn't enabled.
    for (obj in detectedObjects) {
        val id = obj.trackingId       // A number that identifies the object across images
        val bounds = obj.boundingBox  // The object's position in the image
    
        // If classification was enabled:
        val category = obj.classificationCategory
        val confidence = obj.classificationConfidence
    }
    

提高可用性和性能

為了獲得最佳用戶體驗,請在您的應用中遵循以下準則:

  • 成功的物體檢測取決於物體的視覺複雜性。具有少量視覺特徵的對象可能需要佔據要檢測的圖像的較大部分。您應該為用戶提供有關捕獲與您要檢測的對像類型配合良好的輸入的指導。
  • 使用分類時,如果要檢測未完全歸入支持類別的對象,請對未知對象實施特殊處理。

此外,檢查出的[ML套件材料設計展示應用] [展示鏈接] {:.external}和材料設計模式的學習機供電的功能集合。

在實時應用程序中使用流模式時,請遵循以下準則以獲得最佳幀速率:

  • 不要在流模式下使用多對象檢測,因為大多數設備將無法產生足夠的幀率。

  • 如果不需要,請禁用分類。

  • 對檢測器進行節流調用。如果檢測器運行時有新的視頻幀可用,則丟棄該幀。
  • 如果您使用檢測器的輸出在輸入圖像上疊加圖形,請首先從 ML Kit 獲取結果,然後一步渲染圖像並疊加。通過這樣做,您只需為每個輸入幀渲染一次到顯示表面。
  • 如果您使用Camera2 API,捕獲圖像ImageFormat.YUV_420_888格式。

    如果您使用舊相機API,捕獲圖像ImageFormat.NV21格式。