What is Microsoft Form Recognizer? And what is it not?
Bottom line: While easy to configure, the Form Recognizer web service ultimately requires software programming to make use of.
What can it do?
Form Recognizer has 4 levels of abilities:
- Raw data – If you send any kind of document as an image, Form Recognizer will perform We speak about this a lot here. OCR is a technology used to interpret the pixels in an image as... on it. It then returns the We speak about this a lot here. OCR is a technology used to interpret the pixels in an image as... data as structured items such as words and lines. It also finds tables and checkboxes. You can use this raw data to further analyze it in your own systems (full-text search for example).
- Pre-built document types – Out of the box Form Recognizer can recognize common business documents such as invoices, ID documents, receipts, and business cards. It returns predefined and known data for such document types. For invoices, it looks for Invoice numbers, customer data, line items, totals, etc. It doesn’t matter how the vendor designs the invoice and how he calls and labels the data. Whether it is “Invoice Number”, “InvNo” or InvoiceNo”, Form Recognizer identifies it as “Invoice Number”.
- Generic document types – Form Recognizer can also pull data from unknown document types, like specific forms. It can find such things as names, dates, and amounts. However, since it doesn’t know the meaning of these elements, it labels them with whatever is next to them. If you were to use this approach on an invoice, it would not know that “Invoice No” or “Involve Nurnbar” (a typical mistake We speak about this a lot here. OCR is a technology used to interpret the pixels in an image as... makes) actually are the keywords used for the Invoice Number field. It would just call the field whatever it thinks it printed next to it. This requires post-processing to understand the randomly labeled data. The technology is also limited to data that has a clear keyword either to the left of it or above. However, the configuration is simple: There is none. You just send it a document and it returns the “data”.
- Custom document types – A better approach than the generic document types are custom document types. These are not configuration-free. Instead, it requires training the form recognizer to pull known data from your custom document type. It requires 5 samples of the document type, and on each of them, you need to label the fields you want it to extract. That’s still easy and quick to do and does not require a programmer or data scientist. The advantage is that it can then extract an “Invoice Number” or whatever field you teach it to extract, regardless of the field’s name or whether the We speak about this a lot here. OCR is a technology used to interpret the pixels in an image as... reads the label correctly. Of course, the more variable the document type is in terms of layout and text, the more samples it will need to learn.
Form Recognizer Studio
What are its limits?
Several aspects limit Form Recognizer:
- We speak about this a lot here. OCR is a technology used to interpret the pixels in an image as... for machine print supports a lot of languages, they are listed here. 7 languages for handwritten text and what looks like 100+ languages for machine printed text.
- Receipts and Business cards only support English-speaking countries like the USA, Canada, the UK, and others.
- Form Recognizer only supports US Invoices.
- With regards to ID documents, only US driver licenses and the biographical page from international passports are supported.
- Generic document types support only the English language.
As input, Form Recognizer supports JPEG, PNG, BMP, TIFF, and PDF. Multi-page document formats such as PDF and TIFF can have a maximum of 2000 pages.
For custom documents and invoices Microsoft charges 50 USD per 1000 pages.
Generic documents, business cards, IDs, and receipts cost you 10 USD per 1000 pages.