Join us in person and online for Firebase Summit on October 18, 2022. Learn how Firebase can help you accelerate app development, release your app with confidence, and scale with ease. Register now

在 iOS 上使用 Firebase ML 識別圖像中的文本

透過集合功能整理內容 你可以依據偏好儲存及分類內容。

您可以使用 Firebase ML 識別圖像中的文本。 Firebase ML 既有一個適用於識別圖像中文本(例如路牌文本)的通用 API,又有一個針對識別文檔文本進行了優化的 API。

在你開始之前

    如果您尚未將 Firebase 添加到您的應用,請按照入門指南中的步驟進行操作。

    使用 Swift Package Manager 安裝和管理 Firebase 依賴項。

    1. 在 Xcode 中,打開您的應用項目,導航到File > Add Packages
    2. 出現提示時,添加 Firebase Apple 平台 SDK 存儲庫:
    3.   https://github.com/firebase/firebase-ios-sdk
    4. 選擇 Firebase ML 庫。
    5. 完成後,Xcode 將在後台自動開始解析和下載您的依賴項。

    接下來,執行一些應用內設置:

    1. 在您的應用中,導入 Firebase:

      迅速

      import FirebaseMLModelDownloader

      Objective-C

      @import FirebaseMLModelDownloader;
  1. 如果您尚未為您的項目啟用基於雲的 API,請立即執行此操作:

    1. 打開 Firebase 控制台的Firebase ML API 頁面
    2. 如果您尚未將項目升級到 Blaze 定價計劃,請單擊升級以執行此操作。 (僅當您的項目不在 Blaze 計劃中時,系統才會提示您升級。)

      只有 Blaze 級項目可以使用基於雲的 API。

    3. 如果尚未啟用基於雲的 API,請單擊啟用基於雲的 API

現在您已準備好開始識別圖像中的文本。

輸入圖像指南

  • 為了讓 Firebase ML 準確識別文本,輸入圖像必須包含由足夠像素數據表示的文本。理想情況下,對於拉丁文本,每個字符至少應為 16x16 像素。對於中文、日文和韓文文本,每個字符應為 24x24 像素。對於所有語言,大於 24x24 像素的字符通常沒有準確性優勢。

    因此,例如,一張 640x480 的圖像可能適用於掃描佔據整個圖像寬度的名片。要掃描在信紙大小的紙張上打印的文檔,可能需要 720x1280 像素的圖像。

  • 圖像聚焦不佳會損害文本識別的準確性。如果您沒有得到可接受的結果,請嘗試要求用戶重新捕獲圖像。


識別圖像中的文本

要識別圖像中的文本,請按如下所述運行文本識別器。

1.運行文本識別器

將圖像作為UIImageCMSampleBufferRef傳遞給VisionTextRecognizerprocess(_:completion:)方法:

  1. 通過調用cloudTextRecognizer獲取VisionTextRecognizer的實例:

    迅速

    let vision = Vision.vision()
    let textRecognizer = vision.cloudTextRecognizer()
    
    // Or, to provide language hints to assist with language detection:
    // See https://cloud.google.com/vision/docs/languages for supported languages
    let options = VisionCloudTextRecognizerOptions()
    options.languageHints = ["en", "hi"]
    let textRecognizer = vision.cloudTextRecognizer(options: options)
    

    Objective-C

    FIRVision *vision = [FIRVision vision];
    FIRVisionTextRecognizer *textRecognizer = [vision cloudTextRecognizer];
    
    // Or, to provide language hints to assist with language detection:
    // See https://cloud.google.com/vision/docs/languages for supported languages
    FIRVisionCloudTextRecognizerOptions *options =
            [[FIRVisionCloudTextRecognizerOptions alloc] init];
    options.languageHints = @[@"en", @"hi"];
    FIRVisionTextRecognizer *textRecognizer = [vision cloudTextRecognizerWithOptions:options];
    
  2. 為了調用 Cloud Vision,圖像必須格式化為 base64 編碼的字符串。處理UIImage

    迅速

    guard let imageData = uiImage.jpegData(compressionQuality: 1.0f) else { return }
    let base64encodedImage = imageData.base64EncodedString()

    Objective-C

    NSData *imageData = UIImageJPEGRepresentation(uiImage, 1.0f);
    NSString *base64encodedImage =
      [imageData base64EncodedStringWithOptions:NSDataBase64Encoding76CharacterLineLength];
  3. 然後,將圖像傳遞給process(_:completion:)方法:

    迅速

    textRecognizer.process(visionImage) { result, error in
      guard error == nil, let result = result else {
        // ...
        return
      }
    
      // Recognized text
    }
    

    Objective-C

    [textRecognizer processImage:image
                      completion:^(FIRVisionText *_Nullable result,
                                   NSError *_Nullable error) {
      if (error != nil || result == nil) {
        // ...
        return;
      }
    
      // Recognized text
    }];
    

2.從識別文本塊中提取文本

如果文本識別操作成功,它將返回一個VisionText對象。 VisionText對象包含圖像中識別的全文和零個或多個VisionTextBlock對象。

每個VisionTextBlock代表一個矩形文本塊,其中包含零個或多個VisionTextLine對象。每個VisionTextLine對象包含零個或多個VisionTextElement對象,它們表示單詞和類似單詞的實體(日期、數字等)。

對於每個VisionTextBlockVisionTextLineVisionTextElement對象,您可以獲得區域中識別的文本和區域的邊界坐標。

例如:

迅速

let resultText = result.text
for block in result.blocks {
    let blockText = block.text
    let blockConfidence = block.confidence
    let blockLanguages = block.recognizedLanguages
    let blockCornerPoints = block.cornerPoints
    let blockFrame = block.frame
    for line in block.lines {
        let lineText = line.text
        let lineConfidence = line.confidence
        let lineLanguages = line.recognizedLanguages
        let lineCornerPoints = line.cornerPoints
        let lineFrame = line.frame
        for element in line.elements {
            let elementText = element.text
            let elementConfidence = element.confidence
            let elementLanguages = element.recognizedLanguages
            let elementCornerPoints = element.cornerPoints
            let elementFrame = element.frame
        }
    }
}

Objective-C

NSString *resultText = result.text;
for (FIRVisionTextBlock *block in result.blocks) {
  NSString *blockText = block.text;
  NSNumber *blockConfidence = block.confidence;
  NSArray<FIRVisionTextRecognizedLanguage *> *blockLanguages = block.recognizedLanguages;
  NSArray<NSValue *> *blockCornerPoints = block.cornerPoints;
  CGRect blockFrame = block.frame;
  for (FIRVisionTextLine *line in block.lines) {
    NSString *lineText = line.text;
    NSNumber *lineConfidence = line.confidence;
    NSArray<FIRVisionTextRecognizedLanguage *> *lineLanguages = line.recognizedLanguages;
    NSArray<NSValue *> *lineCornerPoints = line.cornerPoints;
    CGRect lineFrame = line.frame;
    for (FIRVisionTextElement *element in line.elements) {
      NSString *elementText = element.text;
      NSNumber *elementConfidence = element.confidence;
      NSArray<FIRVisionTextRecognizedLanguage *> *elementLanguages = element.recognizedLanguages;
      NSArray<NSValue *> *elementCornerPoints = element.cornerPoints;
      CGRect elementFrame = element.frame;
    }
  }
}

下一步


識別文檔圖像中的文本

要識別文檔的文本,請按如下所述配置並運行文檔文本識別器。

下面描述的文檔文本識別 API 提供了一個接口,旨在更方便地處理文檔圖像。但是,如果您更喜歡稀疏文本 API 提供的接口,則可以通過將雲文本識別器配置為使用密集文本模型來代替它來掃描文檔。

要使用文檔文本識別 API:

1.運行文本識別器

將圖像作為UIImageCMSampleBufferRef傳遞給VisionDocumentTextRecognizerprocess(_:completion:)方法:

  1. 通過調用cloudDocumentTextRecognizer獲取VisionDocumentTextRecognizer的實例:

    迅速

    let vision = Vision.vision()
    let textRecognizer = vision.cloudDocumentTextRecognizer()
    
    // Or, to provide language hints to assist with language detection:
    // See https://cloud.google.com/vision/docs/languages for supported languages
    let options = VisionCloudDocumentTextRecognizerOptions()
    options.languageHints = ["en", "hi"]
    let textRecognizer = vision.cloudDocumentTextRecognizer(options: options)
    

    Objective-C

    FIRVision *vision = [FIRVision vision];
    FIRVisionDocumentTextRecognizer *textRecognizer = [vision cloudDocumentTextRecognizer];
    
    // Or, to provide language hints to assist with language detection:
    // See https://cloud.google.com/vision/docs/languages for supported languages
    FIRVisionCloudDocumentTextRecognizerOptions *options =
            [[FIRVisionCloudDocumentTextRecognizerOptions alloc] init];
    options.languageHints = @[@"en", @"hi"];
    FIRVisionDocumentTextRecognizer *textRecognizer = [vision cloudDocumentTextRecognizerWithOptions:options];
    
  2. 為了調用 Cloud Vision,圖像必須格式化為 base64 編碼的字符串。處理UIImage

    迅速

    guard let imageData = uiImage.jpegData(compressionQuality: 1.0f) else { return }
    let base64encodedImage = imageData.base64EncodedString()

    Objective-C

    NSData *imageData = UIImageJPEGRepresentation(uiImage, 1.0f);
    NSString *base64encodedImage =
      [imageData base64EncodedStringWithOptions:NSDataBase64Encoding76CharacterLineLength];
  3. 然後,將圖像傳遞給process(_:completion:)方法:

    迅速

    textRecognizer.process(visionImage) { result, error in
      guard error == nil, let result = result else {
        // ...
        return
      }
    
      // Recognized text
    }
    

    Objective-C

    [textRecognizer processImage:image
                      completion:^(FIRVisionDocumentText *_Nullable result,
                                   NSError *_Nullable error) {
      if (error != nil || result == nil) {
        // ...
        return;
      }
    
        // Recognized text
    }];
    

2.從識別文本塊中提取文本

如果文本識別操作成功,它將返回一個VisionDocumentText對象。 VisionDocumentText對象包含圖像中識別的全文和反映識別文檔結構的對象層次結構:

對於每個VisionDocumentTextBlockVisionDocumentTextParagraphVisionDocumentTextWordVisionDocumentTextSymbol對象,您可以獲得區域中識別的文本和區域的邊界坐標。

例如:

迅速

let resultText = result.text
for block in result.blocks {
    let blockText = block.text
    let blockConfidence = block.confidence
    let blockRecognizedLanguages = block.recognizedLanguages
    let blockBreak = block.recognizedBreak
    let blockCornerPoints = block.cornerPoints
    let blockFrame = block.frame
    for paragraph in block.paragraphs {
        let paragraphText = paragraph.text
        let paragraphConfidence = paragraph.confidence
        let paragraphRecognizedLanguages = paragraph.recognizedLanguages
        let paragraphBreak = paragraph.recognizedBreak
        let paragraphCornerPoints = paragraph.cornerPoints
        let paragraphFrame = paragraph.frame
        for word in paragraph.words {
            let wordText = word.text
            let wordConfidence = word.confidence
            let wordRecognizedLanguages = word.recognizedLanguages
            let wordBreak = word.recognizedBreak
            let wordCornerPoints = word.cornerPoints
            let wordFrame = word.frame
            for symbol in word.symbols {
                let symbolText = symbol.text
                let symbolConfidence = symbol.confidence
                let symbolRecognizedLanguages = symbol.recognizedLanguages
                let symbolBreak = symbol.recognizedBreak
                let symbolCornerPoints = symbol.cornerPoints
                let symbolFrame = symbol.frame
            }
        }
    }
}

Objective-C

NSString *resultText = result.text;
for (FIRVisionDocumentTextBlock *block in result.blocks) {
  NSString *blockText = block.text;
  NSNumber *blockConfidence = block.confidence;
  NSArray<FIRVisionTextRecognizedLanguage *> *blockRecognizedLanguages = block.recognizedLanguages;
  FIRVisionTextRecognizedBreak *blockBreak = block.recognizedBreak;
  CGRect blockFrame = block.frame;
  for (FIRVisionDocumentTextParagraph *paragraph in block.paragraphs) {
    NSString *paragraphText = paragraph.text;
    NSNumber *paragraphConfidence = paragraph.confidence;
    NSArray<FIRVisionTextRecognizedLanguage *> *paragraphRecognizedLanguages = paragraph.recognizedLanguages;
    FIRVisionTextRecognizedBreak *paragraphBreak = paragraph.recognizedBreak;
    CGRect paragraphFrame = paragraph.frame;
    for (FIRVisionDocumentTextWord *word in paragraph.words) {
      NSString *wordText = word.text;
      NSNumber *wordConfidence = word.confidence;
      NSArray<FIRVisionTextRecognizedLanguage *> *wordRecognizedLanguages = word.recognizedLanguages;
      FIRVisionTextRecognizedBreak *wordBreak = word.recognizedBreak;
      CGRect wordFrame = word.frame;
      for (FIRVisionDocumentTextSymbol *symbol in word.symbols) {
        NSString *symbolText = symbol.text;
        NSNumber *symbolConfidence = symbol.confidence;
        NSArray<FIRVisionTextRecognizedLanguage *> *symbolRecognizedLanguages = symbol.recognizedLanguages;
        FIRVisionTextRecognizedBreak *symbolBreak = symbol.recognizedBreak;
        CGRect symbolFrame = symbol.frame;
      }
    }
  }
}

下一步