Label Images with ML Kit on iOS

You can use ML Kit to label objects recognized in an image, using either an on-device model or a cloud model. See the overview to learn about the benefits of each approach.

See the ML Kit quickstart sample on GitHub for an example of this API in use.

Before you begin

  1. If you have not already added Firebase to your app, do so by following the steps in the getting started guide.
  2. Include the ML Kit libraries in your Podfile:
    pod 'Firebase/Core'
    pod 'Firebase/MLVision'

    # If using the on-device API: pod 'Firebase/MLVisionLabelModel'

    After you install or update your project's Pods, be sure to open your Xcode project using its .xcworkspace.
  3. In your app, import Firebase:

    Swift

    import Firebase

    Objective-C

    @import Firebase;
  4. If you want to use the Cloud-based model, and you have not already enabled the Cloud-based APIs for your project, do so now:

    1. Open the ML Kit APIs page of the Firebase console.
    2. If you have not already upgraded your project to a Blaze plan, click Upgrade to do so. (You will be prompted to upgrade only if your project isn't on the Blaze plan.)

      Only Blaze-level projects can use Cloud-based APIs.

    3. If Cloud-based APIs aren't already enabled, click Enable Cloud-based APIs.

    If you want to use only the on-device model, you can skip this step.

Now you are ready to label images using either an on-device model or a cloud-based model.


On-device image labeling

To use the on-device image labeling model, configure and run the the image labeler as described below.

1. Configure the image labeler

By default, the on-device image labeler returns only labels that have a confidence score of 0.5 or greater. If you want to change this setting, create a VisionLabelDetectorOptions object as in the following example:

Swift

let options = VisionLabelDetectorOptions(
  confidenceThreshold: Constants.labelConfidenceThreshold
)

Objective-C

FIRVisionLabelDetectorOptions *options =
    [[FIRVisionLabelDetectorOptions alloc] initWithConfidenceThreshold:0.6f];

2. Run the image labeler

To recognize and label entities in an image, pass the image as a UIImage or a CMSampleBufferRef to the VisionLabelDetector's detect(in:) method:

  1. Get an instance of VisionLabelDetector:

    Swift

    lazy var vision = Vision.vision()
    let labelDetector = vision.labelDetector(options: options)
    

    Objective-C

    FIRVision *vision = [FIRVision vision];
    FIRVisionLabelDetector *labelDetector = [vision labelDetector];
    // Or, to change the default settings:
    // FIRVisionLabelDetector *labelDetector =
    //     [vision labelDetectorWithOptions:options];
    
  2. Create a VisionImage object using a UIImage or a CMSampleBufferRef.

    To use a UIImage:

    1. If necessary, rotate the image so that its imageOrientation property is .up.
    2. Create a VisionImage object using the correctly-rotated UIImage. Do not specify any rotation metadata—the default value, .topLeft, must be used.

      Swift

      let image = VisionImage(image: uiImage)

      Objective-C

      FIRVisionImage *image = [[FIRVisionImage alloc] initWithImage:uiImage];
      

    To use a CMSampleBufferRef:

    1. Create a VisionImageMetadata object that specifies the orientation of the image data contained in the CMSampleBufferRef buffer.

      For example, if you are using image data captured from the device's back-facing camera:

      Swift

      let metadata = VisionImageMetadata()
      
      // Using back-facing camera
      let devicePosition: AVCaptureDevice.Position = .back
      
      let deviceOrientation = UIDevice.current.orientation
      switch deviceOrientation {
      case .portrait:
          metadata.orientation = devicePosition == .front ? .leftMirrored : .right
      case .landscapeLeft:
          metadata.orientation = devicePosition == .front ? .downMirrored : .up
      case .portraitUpsideDown:
          metadata.orientation = devicePosition == .front ? .rightMirrored : .left
      case .landscapeRight:
          metadata.orientation = devicePosition == .front ? .upMirrored : .down
      case .faceDown, .faceUp, .unknown:
          metadata.orientation = .up
      }
      

      Objective-C

      // Calculate the image orientation
      FIRVisionDetectorImageOrientation orientation;
      
      // Using front-facing camera
      AVCaptureDevicePosition devicePosition = AVCaptureDevicePositionFront;
      
      UIDeviceOrientation deviceOrientation = UIDevice.currentDevice.orientation;
      switch (deviceOrientation) {
          case UIDeviceOrientationPortrait:
              if (devicePosition == AVCaptureDevicePositionFront) {
                  orientation = FIRVisionDetectorImageOrientationLeftTop;
              } else {
                  orientation = FIRVisionDetectorImageOrientationRightTop;
              }
              break;
          case UIDeviceOrientationLandscapeLeft:
              if (devicePosition == AVCaptureDevicePositionFront) {
                  orientation = FIRVisionDetectorImageOrientationBottomLeft;
              } else {
                  orientation = FIRVisionDetectorImageOrientationTopLeft;
              }
              break;
          case UIDeviceOrientationPortraitUpsideDown:
              if (devicePosition == AVCaptureDevicePositionFront) {
                  orientation = FIRVisionDetectorImageOrientationRightBottom;
              } else {
                  orientation = FIRVisionDetectorImageOrientationLeftBottom;
              }
              break;
          case UIDeviceOrientationLandscapeRight:
              if (devicePosition == AVCaptureDevicePositionFront) {
                  orientation = FIRVisionDetectorImageOrientationTopRight;
              } else {
                  orientation = FIRVisionDetectorImageOrientationBottomRight;
              }
              break;
          default:
              orientation = FIRVisionDetectorImageOrientationTopLeft;
              break;
      }
      
      FIRVisionImageMetadata *metadata = [[FIRVisionImageMetadata alloc] init];
      metadata.orientation = orientation;
      
    2. Create a VisionImage object using the CMSampleBufferRef object and the rotation metadata:

      Swift

      let image = VisionImage(buffer: bufferRef)
      image.metadata = metadata
      

      Objective-C

      FIRVisionImage *image = [[FIRVisionImage alloc] initWithBuffer:buffer];
      image.metadata = metadata;
      
  3. Then, pass the image to the detect(in:) method:

    Swift

    labelDetector.detect(in: visionImage) { features, error in
      guard error == nil, let features = features, !features.isEmpty else {
        // ...
        return
      }
    
      // ...
    }

    Objective-C

    [labelDetector detectInImage:image
                      completion:^(NSArray<FIRVisionLabel *> *labels,
                                   NSError *error) {
      if (error != nil || labels.count == 0) {
        return;
      }
      // Got labels. Access label info via FIRVisionLabel.
    }];
    

3. Get information about labeled objects

If image labeling succeeds, an array of VisionLabel objects will be passed to the completion handler. From each object, you can get information about a feature recognized in the image.

For example:

Swift

for label in labels {
  let labelText = label.label
  let entityId = label.entityID
  let confidence = label.confidence
}

Objective-C

for (FIRVisionLabel *label in labels) {
  NSString *labelText = label.label;
  NSString *entityId = label.entityID;
  float confidence = label.confidence;
}

Tips to improve real-time performance

If you want to label images in a real-time application, follow these guidelines to achieve the best framerates:

  • Throttle calls to the image labeler. If a new video frame becomes available while the image labeler is running, drop the frame.
  • Capture images at a lower resolution. An image captured using AVCaptureSessionPresetMedium is typically sufficient.

Cloud image labeling

To use the Cloud-based image labeling model, configure and run the the image labeler as described below.

1. Configure the image labeler

By default, the Cloud detector uses the stable version of the model and returns up to 10 results. If you want to change either of these settings, specify them with a VisionCloudDetectorOptions object as in the following example:

Swift

let options = VisionCloudDetectorOptions()
options.modelType = .latest
options.maxResults = 20

Objective-C

FIRVisionCloudDetectorOptions *options =
    [[FIRVisionCloudDetectorOptions alloc] init];
options.modelType = FIRVisionCloudModelTypeLatest;
options.maxResults = 20;

In the next step, pass the VisionCloudDetectorOptions object when you create the Cloud detector object.

2. Run the image labeler

To recognize and label entities in an image, pass the image as a UIImage or a CMSampleBufferRef to the VisionCloudLabelDetector's detect(in:) method:

  1. Get an instance of VisionCloudLabelDetector:

    Swift

    let labelDetector = vision.cloudLabelDetector()
    // Or, to change the default settings:
    // let labelDetector = Vision.vision().cloudLabelDetector(options: options)

    Objective-C

    FIRVision *vision = [FIRVision vision];
    FIRVisionCloudLabelDetector *labelDetector = [vision cloudLabelDetector];
    // Or, to change the default settings:
    // FIRVisionCloudLabelDetector *labelDetector =
    //     [vision cloudLabelDetectorWithOptions:options];
    
  2. Create a VisionImage object using a UIImage or a CMSampleBufferRef.

    To use a UIImage:

    1. If necessary, rotate the image so that its imageOrientation property is .up.
    2. Create a VisionImage object using the correctly-rotated UIImage. Do not specify any rotation metadata—the default value, .topLeft, must be used.

      Swift

      let image = VisionImage(image: uiImage)

      Objective-C

      FIRVisionImage *image = [[FIRVisionImage alloc] initWithImage:uiImage];
      

    To use a CMSampleBufferRef:

    1. Create a VisionImageMetadata object that specifies the orientation of the image data contained in the CMSampleBufferRef buffer.

      For example, if you are using image data captured from the device's back-facing camera:

      Swift

      let metadata = VisionImageMetadata()
      
      // Using back-facing camera
      let devicePosition: AVCaptureDevice.Position = .back
      
      let deviceOrientation = UIDevice.current.orientation
      switch deviceOrientation {
      case .portrait:
          metadata.orientation = devicePosition == .front ? .leftMirrored : .right
      case .landscapeLeft:
          metadata.orientation = devicePosition == .front ? .downMirrored : .up
      case .portraitUpsideDown:
          metadata.orientation = devicePosition == .front ? .rightMirrored : .left
      case .landscapeRight:
          metadata.orientation = devicePosition == .front ? .upMirrored : .down
      case .faceDown, .faceUp, .unknown:
          metadata.orientation = .up
      }
      

      Objective-C

      // Calculate the image orientation
      FIRVisionDetectorImageOrientation orientation;
      
      // Using front-facing camera
      AVCaptureDevicePosition devicePosition = AVCaptureDevicePositionFront;
      
      UIDeviceOrientation deviceOrientation = UIDevice.currentDevice.orientation;
      switch (deviceOrientation) {
          case UIDeviceOrientationPortrait:
              if (devicePosition == AVCaptureDevicePositionFront) {
                  orientation = FIRVisionDetectorImageOrientationLeftTop;
              } else {
                  orientation = FIRVisionDetectorImageOrientationRightTop;
              }
              break;
          case UIDeviceOrientationLandscapeLeft:
              if (devicePosition == AVCaptureDevicePositionFront) {
                  orientation = FIRVisionDetectorImageOrientationBottomLeft;
              } else {
                  orientation = FIRVisionDetectorImageOrientationTopLeft;
              }
              break;
          case UIDeviceOrientationPortraitUpsideDown:
              if (devicePosition == AVCaptureDevicePositionFront) {
                  orientation = FIRVisionDetectorImageOrientationRightBottom;
              } else {
                  orientation = FIRVisionDetectorImageOrientationLeftBottom;
              }
              break;
          case UIDeviceOrientationLandscapeRight:
              if (devicePosition == AVCaptureDevicePositionFront) {
                  orientation = FIRVisionDetectorImageOrientationTopRight;
              } else {
                  orientation = FIRVisionDetectorImageOrientationBottomRight;
              }
              break;
          default:
              orientation = FIRVisionDetectorImageOrientationTopLeft;
              break;
      }
      
      FIRVisionImageMetadata *metadata = [[FIRVisionImageMetadata alloc] init];
      metadata.orientation = orientation;
      
    2. Create a VisionImage object using the CMSampleBufferRef object and the rotation metadata:

      Swift

      let image = VisionImage(buffer: bufferRef)
      image.metadata = metadata
      

      Objective-C

      FIRVisionImage *image = [[FIRVisionImage alloc] initWithBuffer:buffer];
      image.metadata = metadata;
      
  3. Then, pass the image to the detect(in:) method:

    Swift

      labelDetector.detect(in: visionImage) { labels, error in
        guard error == nil, let labels = labels, !labels.isEmpty else {
          // ...
          return
        }
    
        // Labeled image
        // START_EXCLUDE
        self.resultsText = labels.map { label -> String in
          "Label: \(String(describing: label.label ?? "")), " +
            "Confidence: \(label.confidence ?? 0), " +
          "EntityID: \(label.entityId ?? "")"
          }.joined(separator: "\n")
        self.showResults()
      }
    }

    Objective-C

    [labelDetector detectInImage:image
                      completion:^(NSArray<FIRVisionCloudLabel *> *labels,
                                   NSError *error) {
      if (error != nil || labels.count == 0) {
        return;
      }
      // Got labels. Access label info via FIRVisionCloudLabel.
    }];
    

3. Get information about labeled objects

If image labeling succeeds, an array of VisionCloudLabel objects will be passed to the completion handler. From each object, you can get information about an entity recognized in the image. This information doesn't contain frame or other location data.

For example:

Swift

for label in labels {
  let labelText = label.label
  let entityId = label.entityId
  let confidence = label.confidence
}

Objective-C

for (FIRVisionCloudLabel *label in labels) {
  NSString *labelText = label.label;
  NSString *entityId = label.entityId;
  float confidence = [label.confidence floatValue];
}

Next steps

Before you deploy to production an app that uses a Cloud API, you should take some additional steps to prevent and mitigate the effect of unauthorized API access.

Оставить отзыв о...

Текущей странице
Нужна помощь? Обратитесь в службу поддержки.