จดจำข้อความในรูปภาพด้วย ML Kit บน Android

หน้านี้อธิบาย Text Recognition APIเวอร์ชันเก่า ซึ่งเป็นส่วนหนึ่งของ ML Kit สำหรับ Firebase เราได้แยกฟังก์ชันการทำงานของ API นี้ออกเป็น API ใหม่ 2 รายการ (ดูข้อมูลเพิ่มเติม)

บนอุปกรณ์ text recognition เป็นส่วนหนึ่งของ ML Kit SDK ใหม่แบบสแตนด์อโลน ซึ่งคุณใช้ได้ทั้งแบบมีและไม่มี Firebase
Cloud text recognition เป็นส่วนหนึ่งของ Firebase ML ซึ่งรวมถึงฟีเจอร์ ML บนระบบคลาวด์ทั้งหมดของ Firebase

คุณใช้ ML Kit เพื่อจดจำข้อความในรูปภาพได้ ML Kit มีทั้ง API อเนกประสงค์ที่เหมาะสำหรับการจดจำข้อความในรูปภาพ เช่น ข้อความของป้ายถนน และ API ที่เพิ่มประสิทธิภาพสำหรับการจดจำข้อความของ เอกสาร API แบบอเนกประสงค์มีทั้งโมเดลบนอุปกรณ์และโมเดลในระบบคลาวด์ การจดจำข้อความในเอกสารใช้ได้เฉพาะในรูปแบบโมเดลบนระบบคลาวด์เท่านั้น ดูการเปรียบเทียบโมเดลในระบบคลาวด์และในอุปกรณ์ได้ที่ภาพรวม

ก่อนเริ่มต้น

เพิ่ม Firebase ลงในโปรเจ็กต์ Android หากยังไม่ได้เพิ่ม
เพิ่มทรัพยากร Dependency สำหรับคลัง Android ของ ML Kit ลงในไฟล์ Gradle ของโมดูล (ระดับแอป) (โดยปกติคือ app/build.gradle)
```
apply plugin: 'com.android.application'
apply plugin: 'com.google.gms.google-services'

dependencies {
  // ...

  implementation 'com.google.firebase:firebase-ml-vision:24.0.3'
}
```
ไม่บังคับแต่แนะนํา: หากใช้ API ในอุปกรณ์ ให้กําหนดค่าแอปเพื่อดาวน์โหลดโมเดล ML ไปยังอุปกรณ์โดยอัตโนมัติหลังจากติดตั้งแอปจาก Play Store
โดยเพิ่มการประกาศต่อไปนี้ลงในไฟล์ AndroidManifest.xml ของแอป
```
<application ...>
  ...
  <meta-data
      android:name="com.google.firebase.ml.vision.DEPENDENCIES"
      android:value="ocr" />
  
</application>
```
หากไม่ได้เปิดใช้การดาวน์โหลดโมเดลในเวลาที่ติดตั้ง ระบบจะดาวน์โหลดโมเดล เมื่อคุณเรียกใช้เครื่องตรวจหาในอุปกรณ์เป็นครั้งแรก คำขอที่คุณส่ง ก่อนที่การดาวน์โหลดจะเสร็จสมบูรณ์จะไม่แสดงผลลัพธ์
หากต้องการใช้โมเดลที่อิงตามระบบคลาวด์และยังไม่ได้เปิดใช้ API ที่อิงตามระบบคลาวด์สำหรับโปรเจ็กต์ ให้ทำดังนี้
1. เปิดหน้า API ของ ML KitในFirebaseคอนโซล
2. หากยังไม่ได้อัปเกรดโปรเจ็กต์เป็นแพ็กเกจราคา Blaze ให้คลิกอัปเกรดเพื่อดำเนินการ (ระบบจะแจ้งให้คุณอัปเกรดก็ต่อเมื่อโปรเจ็กต์ไม่ได้ใช้แพ็กเกจ Blaze)
  
  เฉพาะโปรเจ็กต์ระดับ Blaze เท่านั้นที่ใช้ API บนระบบคลาวด์ได้
3. หากยังไม่ได้เปิดใช้ API บนระบบคลาวด์ ให้คลิกเปิดใช้ API บนระบบคลาวด์
ก่อนที่จะติดตั้งใช้งานแอปที่ใช้ Cloud API ในสภาพแวดล้อมการผลิต คุณควรทำตาม ขั้นตอนเพิ่มเติมเพื่อป้องกันและลดผลกระทบจากการเข้าถึง API ที่ไม่ได้รับอนุญาต

หากต้องการใช้เฉพาะโมเดลในอุปกรณ์ ให้ข้ามขั้นตอนนี้

ตอนนี้คุณพร้อมที่จะเริ่มจดจำข้อความในรูปภาพแล้ว

หลักเกณฑ์เกี่ยวกับรูปภาพที่ป้อน

รูปภาพอินพุตต้องมีข้อความที่แสดงโดยข้อมูลพิกเซลที่เพียงพอเพื่อให้ ML Kit จดจำข้อความได้อย่างถูกต้อง ข้อความละติน แต่ละอักขระควรมีขนาดอย่างน้อย 16x16 พิกเซล สำหรับข้อความภาษาจีน ญี่ปุ่น และเกาหลี (เฉพาะ API บนระบบคลาวด์เท่านั้นที่รองรับ) อักขระแต่ละตัว ควรมีขนาด 24x24 พิกเซล โดยทั่วไปแล้ว สำหรับทุกภาษา การมีอักขระขนาดใหญ่กว่า 24x24 พิกเซล ไม่ได้ช่วยเพิ่มความแม่นยำ

เช่น รูปภาพขนาด 640x480 อาจเหมาะกับการสแกนนามบัตร ที่กินพื้นที่ความกว้างทั้งหมดของรูปภาพ หากต้องการสแกนเอกสารที่พิมพ์บน กระดาษขนาด Letter คุณอาจต้องใช้รูปภาพขนาด 720x1280 พิกเซล
โฟกัสของรูปภาพไม่ดีอาจส่งผลต่อความถูกต้องของการจดจำข้อความ หากคุณไม่ได้รับผลลัพธ์ที่ยอมรับได้ ให้ลองขอให้ผู้ใช้ถ่ายภาพใหม่
หากคุณกำลังจดจำข้อความในแอปพลิเคชันแบบเรียลไทม์ คุณอาจต้องพิจารณาขนาดโดยรวมของรูปภาพอินพุตด้วย รูปภาพขนาดเล็กจะประมวลผลได้เร็วกว่า ดังนั้นเพื่อลดเวลาในการตอบสนอง ให้ถ่ายภาพที่ความละเอียดต่ำกว่า (โดยคำนึงถึงข้อกำหนดด้านความแม่นยำข้างต้น) และตรวจสอบว่าข้อความครอบคลุมพื้นที่ในรูปภาพมากที่สุด ดูเพิ่มเติมได้ที่ เคล็ดลับในการปรับปรุงประสิทธิภาพแบบเรียลไทม์

การรู้จำข้อความในรูปภาพ

หากต้องการจดจำข้อความในรูปภาพโดยใช้โมเดลในอุปกรณ์หรือโมเดลบนคลาวด์ ให้เรียกใช้ตัวจดจำข้อความตามที่อธิบายไว้ด้านล่าง

1. เรียกใช้โปรแกรมจดจำข้อความ

หากต้องการจดจำข้อความในรูปภาพ ให้สร้างFirebaseVisionImageออบเจ็กต์ จากBitmap, media.Image, ByteBuffer, อาร์เรย์ไบต์ หรือไฟล์ใน อุปกรณ์ จากนั้นส่งออบเจ็กต์ FirebaseVisionImage ไปยังเมธอด processImage ของ FirebaseVisionTextRecognizer

สร้างออบเจ็กต์ FirebaseVisionImage จากรูปภาพ

หากต้องการสร้างออบเจ็กต์ FirebaseVisionImage จากออบเจ็กต์ media.Image เช่น เมื่อถ่ายภาพจากกล้องของอุปกรณ์ ให้ส่งออบเจ็กต์ media.Image และการหมุนของรูปภาพไปยัง FirebaseVisionImage.fromMediaImage()

หากคุณใช้ไลบรารี CameraX คลาส OnImageCapturedListener และ ImageAnalysis.Analyzer จะคำนวณค่าการหมุน ให้คุณ ดังนั้นคุณเพียงแค่ต้องแปลงการหมุนเป็นค่าคงที่ ROTATION_ ค่าใดค่าหนึ่งของ ML Kit ก่อนเรียกใช้ FirebaseVisionImage.fromMediaImage()

Java

private class YourAnalyzer implements ImageAnalysis.Analyzer {

    private int degreesToFirebaseRotation(int degrees) {
        switch (degrees) {
            case 0:
                return FirebaseVisionImageMetadata.ROTATION_0;
            case 90:
                return FirebaseVisionImageMetadata.ROTATION_90;
            case 180:
                return FirebaseVisionImageMetadata.ROTATION_180;
            case 270:
                return FirebaseVisionImageMetadata.ROTATION_270;
            default:
                throw new IllegalArgumentException(
                        "Rotation must be 0, 90, 180, or 270.");
        }
    }

    @Override
    public void analyze(ImageProxy imageProxy, int degrees) {
        if (imageProxy == null || imageProxy.getImage() == null) {
            return;
        }
        Image mediaImage = imageProxy.getImage();
        int rotation = degreesToFirebaseRotation(degrees);
        FirebaseVisionImage image =
                FirebaseVisionImage.fromMediaImage(mediaImage, rotation);
        // Pass image to an ML Kit Vision API
        // ...
    }
}

Kotlin

private class YourImageAnalyzer : ImageAnalysis.Analyzer {
    private fun degreesToFirebaseRotation(degrees: Int): Int = when(degrees) {
        0 -> FirebaseVisionImageMetadata.ROTATION_0
        90 -> FirebaseVisionImageMetadata.ROTATION_90
        180 -> FirebaseVisionImageMetadata.ROTATION_180
        270 -> FirebaseVisionImageMetadata.ROTATION_270
        else -> throw Exception("Rotation must be 0, 90, 180, or 270.")
    }

    override fun analyze(imageProxy: ImageProxy?, degrees: Int) {
        val mediaImage = imageProxy?.image
        val imageRotation = degreesToFirebaseRotation(degrees)
        if (mediaImage != null) {
            val image = FirebaseVisionImage.fromMediaImage(mediaImage, imageRotation)
            // Pass image to an ML Kit Vision API
            // ...
        }
    }
}

หากไม่ได้ใช้คลังกล้องที่ให้การหมุนของรูปภาพ คุณ สามารถคำนวณจากการหมุนของอุปกรณ์และแนวของเซ็นเซอร์กล้อง ในอุปกรณ์ได้

Java

private static final SparseIntArray ORIENTATIONS = new SparseIntArray();
static {
    ORIENTATIONS.append(Surface.ROTATION_0, 90);
    ORIENTATIONS.append(Surface.ROTATION_90, 0);
    ORIENTATIONS.append(Surface.ROTATION_180, 270);
    ORIENTATIONS.append(Surface.ROTATION_270, 180);
}

/**
 * Get the angle by which an image must be rotated given the device's current
 * orientation.
 */
@RequiresApi(api = Build.VERSION_CODES.LOLLIPOP)
private int getRotationCompensation(String cameraId, Activity activity, Context context)
        throws CameraAccessException {
    // Get the device's current rotation relative to its "native" orientation.
    // Then, from the ORIENTATIONS table, look up the angle the image must be
    // rotated to compensate for the device's rotation.
    int deviceRotation = activity.getWindowManager().getDefaultDisplay().getRotation();
    int rotationCompensation = ORIENTATIONS.get(deviceRotation);

    // On most devices, the sensor orientation is 90 degrees, but for some
    // devices it is 270 degrees. For devices with a sensor orientation of
    // 270, rotate the image an additional 180 ((270 + 270) % 360) degrees.
    CameraManager cameraManager = (CameraManager) context.getSystemService(CAMERA_SERVICE);
    int sensorOrientation = cameraManager
            .getCameraCharacteristics(cameraId)
            .get(CameraCharacteristics.SENSOR_ORIENTATION);
    rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360;

    // Return the corresponding FirebaseVisionImageMetadata rotation value.
    int result;
    switch (rotationCompensation) {
        case 0:
            result = FirebaseVisionImageMetadata.ROTATION_0;
            break;
        case 90:
            result = FirebaseVisionImageMetadata.ROTATION_90;
            break;
        case 180:
            result = FirebaseVisionImageMetadata.ROTATION_180;
            break;
        case 270:
            result = FirebaseVisionImageMetadata.ROTATION_270;
            break;
        default:
            result = FirebaseVisionImageMetadata.ROTATION_0;
            Log.e(TAG, "Bad rotation value: " + rotationCompensation);
    }
    return result;
}VisionImage.java

Kotlin

private val ORIENTATIONS = SparseIntArray()

init {
    ORIENTATIONS.append(Surface.ROTATION_0, 90)
    ORIENTATIONS.append(Surface.ROTATION_90, 0)
    ORIENTATIONS.append(Surface.ROTATION_180, 270)
    ORIENTATIONS.append(Surface.ROTATION_270, 180)
}
/**
 * Get the angle by which an image must be rotated given the device's current
 * orientation.
 */
@RequiresApi(api = Build.VERSION_CODES.LOLLIPOP)
@Throws(CameraAccessException::class)
private fun getRotationCompensation(cameraId: String, activity: Activity, context: Context): Int {
    // Get the device's current rotation relative to its "native" orientation.
    // Then, from the ORIENTATIONS table, look up the angle the image must be
    // rotated to compensate for the device's rotation.
    val deviceRotation = activity.windowManager.defaultDisplay.rotation
    var rotationCompensation = ORIENTATIONS.get(deviceRotation)

    // On most devices, the sensor orientation is 90 degrees, but for some
    // devices it is 270 degrees. For devices with a sensor orientation of
    // 270, rotate the image an additional 180 ((270 + 270) % 360) degrees.
    val cameraManager = context.getSystemService(CAMERA_SERVICE) as CameraManager
    val sensorOrientation = cameraManager
            .getCameraCharacteristics(cameraId)
            .get(CameraCharacteristics.SENSOR_ORIENTATION)!!
    rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360

    // Return the corresponding FirebaseVisionImageMetadata rotation value.
    val result: Int
    when (rotationCompensation) {
        0 -> result = FirebaseVisionImageMetadata.ROTATION_0
        90 -> result = FirebaseVisionImageMetadata.ROTATION_90
        180 -> result = FirebaseVisionImageMetadata.ROTATION_180
        270 -> result = FirebaseVisionImageMetadata.ROTATION_270
        else -> {
            result = FirebaseVisionImageMetadata.ROTATION_0
            Log.e(TAG, "Bad rotation value: $rotationCompensation")
        }
    }
    return result
}VisionImage.kt

จากนั้นส่งออบเจ็กต์ media.Image และค่าการหมุนไปยัง FirebaseVisionImage.fromMediaImage() ดังนี้

Java

FirebaseVisionImage image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation);VisionImage.java

Kotlin

val image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation)VisionImage.kt

หากต้องการสร้างออบเจ็กต์ FirebaseVisionImage จาก URI ของไฟล์ ให้ส่งบริบทของแอปและ URI ของไฟล์ไปยัง FirebaseVisionImage.fromFilePath() ซึ่งจะมีประโยชน์เมื่อคุณ ใช้ACTION_GET_CONTENT Intent เพื่อแจ้งให้ผู้ใช้เลือก รูปภาพจากแอปแกลเลอรี
Java
```
FirebaseVisionImage image;
try {
    image = FirebaseVisionImage.fromFilePath(context, uri);
} catch (IOException e) {
    e.printStackTrace();
}VisionImage.java
```
Kotlin
```
val image: FirebaseVisionImage
try {
    image = FirebaseVisionImage.fromFilePath(context, uri)
} catch (e: IOException) {
    e.printStackTrace()
}VisionImage.kt
```

หากต้องการสร้างFirebaseVisionImageออบเจ็กต์จาก ByteBufferหรืออาร์เรย์ไบต์ ให้คำนวณการหมุนของรูปภาพก่อน ตามที่อธิบายไว้ข้างต้นสำหรับอินพุต media.Image

จากนั้นสร้างFirebaseVisionImageMetadataออบเจ็กต์ ที่มีความสูง ความกว้าง รูปแบบการเข้ารหัสสี และการหมุนของรูปภาพ

Java

FirebaseVisionImageMetadata metadata = new FirebaseVisionImageMetadata.Builder()
        .setWidth(480)   // 480x360 is typically sufficient for
        .setHeight(360)  // image recognition
        .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21)
        .setRotation(rotation)
        .build();VisionImage.java

Kotlin

val metadata = FirebaseVisionImageMetadata.Builder()
        .setWidth(480) // 480x360 is typically sufficient for
        .setHeight(360) // image recognition
        .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21)
        .setRotation(rotation)
        .build()VisionImage.kt

ใช้บัฟเฟอร์หรืออาร์เรย์ และออบเจ็กต์ข้อมูลเมตาเพื่อสร้างออบเจ็กต์ FirebaseVisionImage ดังนี้

Java

FirebaseVisionImage image = FirebaseVisionImage.fromByteBuffer(buffer, metadata);
// Or: FirebaseVisionImage image = FirebaseVisionImage.fromByteArray(byteArray, metadata);VisionImage.java

Kotlin

val image = FirebaseVisionImage.fromByteBuffer(buffer, metadata)
// Or: val image = FirebaseVisionImage.fromByteArray(byteArray, metadata)VisionImage.kt

วิธีสร้างออบเจ็กต์ FirebaseVisionImage จากออบเจ็กต์ Bitmap
Java
```
FirebaseVisionImage image = FirebaseVisionImage.fromBitmap(bitmap);VisionImage.java
```
Kotlin
```
val image = FirebaseVisionImage.fromBitmap(bitmap)VisionImage.kt
```
รูปภาพที่แสดงโดยออบเจ็กต์ Bitmap ต้อง ตั้งตรงโดยไม่ต้องหมุนเพิ่มเติม

รับอินสแตนซ์ของ FirebaseVisionTextRecognizer

วิธีใช้โมเดลในอุปกรณ์

Java

FirebaseVisionTextRecognizer detector = FirebaseVision.getInstance()
        .getOnDeviceTextRecognizer();

Kotlin

val detector = FirebaseVision.getInstance()
        .onDeviceTextRecognizer

วิธีใช้โมเดลบนระบบคลาวด์

Java

FirebaseVisionTextRecognizer detector = FirebaseVision.getInstance()
        .getCloudTextRecognizer();
// Or, to change the default settings:
//   FirebaseVisionTextRecognizer detector = FirebaseVision.getInstance()
//          .getCloudTextRecognizer(options);

// Or, to provide language hints to assist with language detection:
// See https://cloud.google.com/vision/docs/languages for supported languages
FirebaseVisionCloudTextRecognizerOptions options = new FirebaseVisionCloudTextRecognizerOptions.Builder()
        .setLanguageHints(Arrays.asList("en", "hi"))
        .build();

Kotlin

val detector = FirebaseVision.getInstance().cloudTextRecognizer
// Or, to change the default settings:
// val detector = FirebaseVision.getInstance().getCloudTextRecognizer(options)

// Or, to provide language hints to assist with language detection:
// See https://cloud.google.com/vision/docs/languages for supported languages
val options = FirebaseVisionCloudTextRecognizerOptions.Builder()
        .setLanguageHints(listOf("en", "hi"))
        .build()

สุดท้าย ให้ส่งรูปภาพไปยังเมธอด processImage ดังนี้

Java

Task<FirebaseVisionText> result =
        detector.processImage(image)
                .addOnSuccessListener(new OnSuccessListener<FirebaseVisionText>() {
                    @Override
                    public void onSuccess(FirebaseVisionText firebaseVisionText) {
                        // Task completed successfully
                        // ...
                    }
                })
                .addOnFailureListener(
                        new OnFailureListener() {
                            @Override
                            public void onFailure(@NonNull Exception e) {
                                // Task failed with an exception
                                // ...
                            }
                        });

Kotlin

val result = detector.processImage(image)
        .addOnSuccessListener { firebaseVisionText ->
            // Task completed successfully
            // ...
        }
        .addOnFailureListener { e ->
            // Task failed with an exception
            // ...
        }

2. แยกข้อความจากบล็อกข้อความที่ระบบจดจำ

หากการดำเนินการจดจำข้อความสำเร็จ ระบบจะส่งออบเจ็กต์ FirebaseVisionText ไปยังเครื่องมือฟังที่สำเร็จ ออบเจ็กต์ FirebaseVisionText มีข้อความทั้งหมดที่ระบบจดจำได้ใน รูปภาพ และมีออบเจ็กต์ TextBlock ตั้งแต่ 0 รายการขึ้นไป

แต่ละ TextBlock จะแสดงบล็อกข้อความสี่เหลี่ยมผืนผ้า ซึ่งมีออบเจ็กต์ Line ตั้งแต่ 0 รายการขึ้นไป ออบเจ็กต์ Line แต่ละรายการมีออบเจ็กต์ Element 0 รายการขึ้นไป ซึ่งแสดงถึงคำและเอนทิตีที่คล้ายคำ (วันที่ ตัวเลข และอื่นๆ)

สำหรับออบเจ็กต์ TextBlock, Line และ Element แต่ละรายการ คุณจะได้รับข้อความ ที่ระบบจดจำในภูมิภาคและพิกัดขอบเขตของภูมิภาค

เช่น

Java

String resultText = result.getText();
for (FirebaseVisionText.TextBlock block: result.getTextBlocks()) {
    String blockText = block.getText();
    Float blockConfidence = block.getConfidence();
    List<RecognizedLanguage> blockLanguages = block.getRecognizedLanguages();
    Point[] blockCornerPoints = block.getCornerPoints();
    Rect blockFrame = block.getBoundingBox();
    for (FirebaseVisionText.Line line: block.getLines()) {
        String lineText = line.getText();
        Float lineConfidence = line.getConfidence();
        List<RecognizedLanguage> lineLanguages = line.getRecognizedLanguages();
        Point[] lineCornerPoints = line.getCornerPoints();
        Rect lineFrame = line.getBoundingBox();
        for (FirebaseVisionText.Element element: line.getElements()) {
            String elementText = element.getText();
            Float elementConfidence = element.getConfidence();
            List<RecognizedLanguage> elementLanguages = element.getRecognizedLanguages();
            Point[] elementCornerPoints = element.getCornerPoints();
            Rect elementFrame = element.getBoundingBox();
        }
    }
}

Kotlin

val resultText = result.text
for (block in result.textBlocks) {
    val blockText = block.text
    val blockConfidence = block.confidence
    val blockLanguages = block.recognizedLanguages
    val blockCornerPoints = block.cornerPoints
    val blockFrame = block.boundingBox
    for (line in block.lines) {
        val lineText = line.text
        val lineConfidence = line.confidence
        val lineLanguages = line.recognizedLanguages
        val lineCornerPoints = line.cornerPoints
        val lineFrame = line.boundingBox
        for (element in line.elements) {
            val elementText = element.text
            val elementConfidence = element.confidence
            val elementLanguages = element.recognizedLanguages
            val elementCornerPoints = element.cornerPoints
            val elementFrame = element.boundingBox
        }
    }
}

เคล็ดลับในการปรับปรุงประสิทธิภาพแบบเรียลไทม์

หากต้องการใช้โมเดลในอุปกรณ์เพื่อจดจำข้อความในแอปพลิเคชันแบบเรียลไทม์ ให้ทำตามหลักเกณฑ์เหล่านี้เพื่อให้ได้อัตราเฟรมที่ดีที่สุด

จำกัดการเรียกไปยังเครื่องมือจดจำข้อความ หากมีวิดีโอเฟรมใหม่พร้อมใช้งานขณะที่ตัวจดจำข้อความทำงาน ให้ทิ้งเฟรม
หากคุณใช้เอาต์พุตของตัวจดจำข้อความเพื่อซ้อนทับกราฟิกบน รูปภาพอินพุต ให้รับผลลัพธ์จาก ML Kit ก่อน จากนั้นจึงแสดงรูปภาพ และซ้อนทับในขั้นตอนเดียว การทำเช่นนี้จะทำให้คุณแสดงผลไปยังพื้นผิวการแสดงผล เพียงครั้งเดียวสำหรับแต่ละเฟรมอินพุต
หากใช้ API ของ Camera2 ให้ถ่ายภาพในรูปแบบ ImageFormat.YUV_420_888

หากใช้ Camera API เวอร์ชันเก่า ให้ถ่ายภาพในรูปแบบ ImageFormat.NV21
ลองถ่ายภาพที่ความละเอียดต่ำลง อย่างไรก็ตาม โปรดคำนึงถึง ข้อกำหนดด้านขนาดรูปภาพของ API นี้ด้วย

ขั้นตอนถัดไป

ก่อนที่จะนำแอปที่ใช้ Cloud API ไปใช้งานจริง คุณควรดำเนินการเพิ่มเติมเพื่อป้องกันและลดผลกระทบจากการเข้าถึง API ที่ไม่ได้รับอนุญาต

จดจำข้อความในรูปภาพของเอกสาร

หากต้องการจดจำข้อความในเอกสาร ให้กำหนดค่าและเรียกใช้เครื่องมือจดจำข้อความในเอกสารบนระบบคลาวด์ตามที่อธิบายไว้ด้านล่าง

API การจดจำข้อความในเอกสารที่อธิบายไว้ด้านล่างมีอินเทอร์เฟซที่ ออกแบบมาเพื่อให้ทำงานกับรูปภาพของเอกสารได้สะดวกยิ่งขึ้น อย่างไรก็ตาม หากต้องการใช้อินเทอร์เฟซที่ FirebaseVisionTextRecognizer API มีให้ คุณสามารถใช้ API นี้แทนเพื่อสแกนเอกสารได้โดยการกำหนดค่าเครื่องมือจดจำข้อความในระบบคลาวด์ให้ใช้โมเดลข้อความแบบหนาแน่น

วิธีใช้ API การจดจำข้อความในเอกสาร

1. เรียกใช้โปรแกรมจดจำข้อความ

หากต้องการจดจำข้อความในรูปภาพ ให้สร้างออบเจ็กต์ FirebaseVisionImage จากBitmap, media.Image, ByteBuffer, อาร์เรย์ไบต์ หรือไฟล์ในอุปกรณ์ จากนั้นส่งออบเจ็กต์ FirebaseVisionImage ไปยังเมธอด processImage ของ FirebaseVisionDocumentTextRecognizer

สร้างออบเจ็กต์ FirebaseVisionImage จากรูปภาพ

Java

private class YourAnalyzer implements ImageAnalysis.Analyzer {

    private int degreesToFirebaseRotation(int degrees) {
        switch (degrees) {
            case 0:
                return FirebaseVisionImageMetadata.ROTATION_0;
            case 90:
                return FirebaseVisionImageMetadata.ROTATION_90;
            case 180:
                return FirebaseVisionImageMetadata.ROTATION_180;
            case 270:
                return FirebaseVisionImageMetadata.ROTATION_270;
            default:
                throw new IllegalArgumentException(
                        "Rotation must be 0, 90, 180, or 270.");
        }
    }

    @Override
    public void analyze(ImageProxy imageProxy, int degrees) {
        if (imageProxy == null || imageProxy.getImage() == null) {
            return;
        }
        Image mediaImage = imageProxy.getImage();
        int rotation = degreesToFirebaseRotation(degrees);
        FirebaseVisionImage image =
                FirebaseVisionImage.fromMediaImage(mediaImage, rotation);
        // Pass image to an ML Kit Vision API
        // ...
    }
}

Kotlin

private class YourImageAnalyzer : ImageAnalysis.Analyzer {
    private fun degreesToFirebaseRotation(degrees: Int): Int = when(degrees) {
        0 -> FirebaseVisionImageMetadata.ROTATION_0
        90 -> FirebaseVisionImageMetadata.ROTATION_90
        180 -> FirebaseVisionImageMetadata.ROTATION_180
        270 -> FirebaseVisionImageMetadata.ROTATION_270
        else -> throw Exception("Rotation must be 0, 90, 180, or 270.")
    }

    override fun analyze(imageProxy: ImageProxy?, degrees: Int) {
        val mediaImage = imageProxy?.image
        val imageRotation = degreesToFirebaseRotation(degrees)
        if (mediaImage != null) {
            val image = FirebaseVisionImage.fromMediaImage(mediaImage, imageRotation)
            // Pass image to an ML Kit Vision API
            // ...
        }
    }
}

Java

private static final SparseIntArray ORIENTATIONS = new SparseIntArray();
static {
    ORIENTATIONS.append(Surface.ROTATION_0, 90);
    ORIENTATIONS.append(Surface.ROTATION_90, 0);
    ORIENTATIONS.append(Surface.ROTATION_180, 270);
    ORIENTATIONS.append(Surface.ROTATION_270, 180);
}

/**
 * Get the angle by which an image must be rotated given the device's current
 * orientation.
 */
@RequiresApi(api = Build.VERSION_CODES.LOLLIPOP)
private int getRotationCompensation(String cameraId, Activity activity, Context context)
        throws CameraAccessException {
    // Get the device's current rotation relative to its "native" orientation.
    // Then, from the ORIENTATIONS table, look up the angle the image must be
    // rotated to compensate for the device's rotation.
    int deviceRotation = activity.getWindowManager().getDefaultDisplay().getRotation();
    int rotationCompensation = ORIENTATIONS.get(deviceRotation);

    // On most devices, the sensor orientation is 90 degrees, but for some
    // devices it is 270 degrees. For devices with a sensor orientation of
    // 270, rotate the image an additional 180 ((270 + 270) % 360) degrees.
    CameraManager cameraManager = (CameraManager) context.getSystemService(CAMERA_SERVICE);
    int sensorOrientation = cameraManager
            .getCameraCharacteristics(cameraId)
            .get(CameraCharacteristics.SENSOR_ORIENTATION);
    rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360;

    // Return the corresponding FirebaseVisionImageMetadata rotation value.
    int result;
    switch (rotationCompensation) {
        case 0:
            result = FirebaseVisionImageMetadata.ROTATION_0;
            break;
        case 90:
            result = FirebaseVisionImageMetadata.ROTATION_90;
            break;
        case 180:
            result = FirebaseVisionImageMetadata.ROTATION_180;
            break;
        case 270:
            result = FirebaseVisionImageMetadata.ROTATION_270;
            break;
        default:
            result = FirebaseVisionImageMetadata.ROTATION_0;
            Log.e(TAG, "Bad rotation value: " + rotationCompensation);
    }
    return result;
}VisionImage.java

Kotlin

private val ORIENTATIONS = SparseIntArray()

init {
    ORIENTATIONS.append(Surface.ROTATION_0, 90)
    ORIENTATIONS.append(Surface.ROTATION_90, 0)
    ORIENTATIONS.append(Surface.ROTATION_180, 270)
    ORIENTATIONS.append(Surface.ROTATION_270, 180)
}
/**
 * Get the angle by which an image must be rotated given the device's current
 * orientation.
 */
@RequiresApi(api = Build.VERSION_CODES.LOLLIPOP)
@Throws(CameraAccessException::class)
private fun getRotationCompensation(cameraId: String, activity: Activity, context: Context): Int {
    // Get the device's current rotation relative to its "native" orientation.
    // Then, from the ORIENTATIONS table, look up the angle the image must be
    // rotated to compensate for the device's rotation.
    val deviceRotation = activity.windowManager.defaultDisplay.rotation
    var rotationCompensation = ORIENTATIONS.get(deviceRotation)

    // On most devices, the sensor orientation is 90 degrees, but for some
    // devices it is 270 degrees. For devices with a sensor orientation of
    // 270, rotate the image an additional 180 ((270 + 270) % 360) degrees.
    val cameraManager = context.getSystemService(CAMERA_SERVICE) as CameraManager
    val sensorOrientation = cameraManager
            .getCameraCharacteristics(cameraId)
            .get(CameraCharacteristics.SENSOR_ORIENTATION)!!
    rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360

    // Return the corresponding FirebaseVisionImageMetadata rotation value.
    val result: Int
    when (rotationCompensation) {
        0 -> result = FirebaseVisionImageMetadata.ROTATION_0
        90 -> result = FirebaseVisionImageMetadata.ROTATION_90
        180 -> result = FirebaseVisionImageMetadata.ROTATION_180
        270 -> result = FirebaseVisionImageMetadata.ROTATION_270
        else -> {
            result = FirebaseVisionImageMetadata.ROTATION_0
            Log.e(TAG, "Bad rotation value: $rotationCompensation")
        }
    }
    return result
}VisionImage.kt

จากนั้นส่งออบเจ็กต์ media.Image และค่าการหมุนไปยัง FirebaseVisionImage.fromMediaImage() ดังนี้

Java

FirebaseVisionImage image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation);VisionImage.java

Kotlin

val image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation)VisionImage.kt

หากต้องการสร้างออบเจ็กต์ FirebaseVisionImage จาก URI ของไฟล์ ให้ส่งบริบทของแอปและ URI ของไฟล์ไปยัง FirebaseVisionImage.fromFilePath() ซึ่งจะมีประโยชน์เมื่อคุณ ใช้ACTION_GET_CONTENT Intent เพื่อแจ้งให้ผู้ใช้เลือก รูปภาพจากแอปแกลเลอรี
Java
```
FirebaseVisionImage image;
try {
    image = FirebaseVisionImage.fromFilePath(context, uri);
} catch (IOException e) {
    e.printStackTrace();
}VisionImage.java
```
Kotlin
```
val image: FirebaseVisionImage
try {
    image = FirebaseVisionImage.fromFilePath(context, uri)
} catch (e: IOException) {
    e.printStackTrace()
}VisionImage.kt
```

Java

FirebaseVisionImageMetadata metadata = new FirebaseVisionImageMetadata.Builder()
        .setWidth(480)   // 480x360 is typically sufficient for
        .setHeight(360)  // image recognition
        .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21)
        .setRotation(rotation)
        .build();VisionImage.java

Kotlin

val metadata = FirebaseVisionImageMetadata.Builder()
        .setWidth(480) // 480x360 is typically sufficient for
        .setHeight(360) // image recognition
        .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21)
        .setRotation(rotation)
        .build()VisionImage.kt

Java

FirebaseVisionImage image = FirebaseVisionImage.fromByteBuffer(buffer, metadata);
// Or: FirebaseVisionImage image = FirebaseVisionImage.fromByteArray(byteArray, metadata);VisionImage.java

Kotlin

val image = FirebaseVisionImage.fromByteBuffer(buffer, metadata)
// Or: val image = FirebaseVisionImage.fromByteArray(byteArray, metadata)VisionImage.kt

วิธีสร้างออบเจ็กต์ FirebaseVisionImage จากออบเจ็กต์ Bitmap
Java
```
FirebaseVisionImage image = FirebaseVisionImage.fromBitmap(bitmap);VisionImage.java
```
Kotlin
```
val image = FirebaseVisionImage.fromBitmap(bitmap)VisionImage.kt
```
รูปภาพที่แสดงโดยออบเจ็กต์ Bitmap ต้อง ตั้งตรงโดยไม่ต้องหมุนเพิ่มเติม

รับอินสแตนซ์ของ FirebaseVisionDocumentTextRecognizer

Java

FirebaseVisionDocumentTextRecognizer detector = FirebaseVision.getInstance()
        .getCloudDocumentTextRecognizer();

// Or, to provide language hints to assist with language detection:
// See https://cloud.google.com/vision/docs/languages for supported languages
FirebaseVisionCloudDocumentRecognizerOptions options =
        new FirebaseVisionCloudDocumentRecognizerOptions.Builder()
                .setLanguageHints(Arrays.asList("en", "hi"))
                .build();
FirebaseVisionDocumentTextRecognizer detector = FirebaseVision.getInstance()
        .getCloudDocumentTextRecognizer(options);

Kotlin

val detector = FirebaseVision.getInstance()
        .cloudDocumentTextRecognizer

// Or, to provide language hints to assist with language detection:
// See https://cloud.google.com/vision/docs/languages for supported languages
val options = FirebaseVisionCloudDocumentRecognizerOptions.Builder()
        .setLanguageHints(listOf("en", "hi"))
        .build()
val detector = FirebaseVision.getInstance()
        .getCloudDocumentTextRecognizer(options)

สุดท้าย ให้ส่งรูปภาพไปยังเมธอด processImage ดังนี้

Java

detector.processImage(myImage)
        .addOnSuccessListener(new OnSuccessListener<FirebaseVisionDocumentText>() {
            @Override
            public void onSuccess(FirebaseVisionDocumentText result) {
                // Task completed successfully
                // ...
            }
        })
        .addOnFailureListener(new OnFailureListener() {
            @Override
            public void onFailure(@NonNull Exception e) {
                // Task failed with an exception
                // ...
            }
        });

Kotlin

detector.processImage(myImage)
        .addOnSuccessListener { firebaseVisionDocumentText ->
            // Task completed successfully
            // ...
        }
        .addOnFailureListener { e ->
            // Task failed with an exception
            // ...
        }

2. แยกข้อความจากบล็อกข้อความที่ระบบจดจำ

หากการดำเนินการจดจำข้อความสำเร็จ ระบบจะแสดงออบเจ็กต์ FirebaseVisionDocumentText ออบเจ็กต์ FirebaseVisionDocumentTextมีข้อความทั้งหมดที่ระบบจดจำใน รูปภาพและลำดับชั้นของออบเจ็กต์ที่แสดงโครงสร้างของเอกสารที่ระบบจดจำ

สำหรับออบเจ็กต์ Block, Paragraph, Word และ Symbol แต่ละรายการ คุณจะได้รับ ข้อความที่ระบบจดจำในภูมิภาคและพิกัดขอบเขตของภูมิภาค

เช่น

Java

String resultText = result.getText();
for (FirebaseVisionDocumentText.Block block: result.getBlocks()) {
    String blockText = block.getText();
    Float blockConfidence = block.getConfidence();
    List<RecognizedLanguage> blockRecognizedLanguages = block.getRecognizedLanguages();
    Rect blockFrame = block.getBoundingBox();
    for (FirebaseVisionDocumentText.Paragraph paragraph: block.getParagraphs()) {
        String paragraphText = paragraph.getText();
        Float paragraphConfidence = paragraph.getConfidence();
        List<RecognizedLanguage> paragraphRecognizedLanguages = paragraph.getRecognizedLanguages();
        Rect paragraphFrame = paragraph.getBoundingBox();
        for (FirebaseVisionDocumentText.Word word: paragraph.getWords()) {
            String wordText = word.getText();
            Float wordConfidence = word.getConfidence();
            List<RecognizedLanguage> wordRecognizedLanguages = word.getRecognizedLanguages();
            Rect wordFrame = word.getBoundingBox();
            for (FirebaseVisionDocumentText.Symbol symbol: word.getSymbols()) {
                String symbolText = symbol.getText();
                Float symbolConfidence = symbol.getConfidence();
                List<RecognizedLanguage> symbolRecognizedLanguages = symbol.getRecognizedLanguages();
                Rect symbolFrame = symbol.getBoundingBox();
            }
        }
    }
}

Kotlin

val resultText = result.text
for (block in result.blocks) {
    val blockText = block.text
    val blockConfidence = block.confidence
    val blockRecognizedLanguages = block.recognizedLanguages
    val blockFrame = block.boundingBox
    for (paragraph in block.paragraphs) {
        val paragraphText = paragraph.text
        val paragraphConfidence = paragraph.confidence
        val paragraphRecognizedLanguages = paragraph.recognizedLanguages
        val paragraphFrame = paragraph.boundingBox
        for (word in paragraph.words) {
            val wordText = word.text
            val wordConfidence = word.confidence
            val wordRecognizedLanguages = word.recognizedLanguages
            val wordFrame = word.boundingBox
            for (symbol in word.symbols) {
                val symbolText = symbol.text
                val symbolConfidence = symbol.confidence
                val symbolRecognizedLanguages = symbol.recognizedLanguages
                val symbolFrame = symbol.boundingBox
            }
        }
    }
}

ขั้นตอนถัดไป

ก่อนที่จะนำแอปที่ใช้ Cloud API ไปใช้งานจริง คุณควรดำเนินการเพิ่มเติมเพื่อป้องกันและลดผลกระทบจากการเข้าถึง API ที่ไม่ได้รับอนุญาต