Working with Azure Form Recognizer Part 2
In part 1, we took a look at how the new Azure Form Recognizer (Preview) service can be used to extract key-value pairs from forms in image and PDF formats using custom trained machine learning models.
In part 2, we take a look at how the service works without custom models. While using the custom models is extremely powerful for extracting form data, the challenge is that you need to know what kind of form you are ingesting or know which model a user is submitting a form against. This may be perfectly fine in some workflows, but in other processes, you may want to automatically determine which model to use. For that, you will need to use Form Recognizer without a custom model to extract the data from the raw data from the form and perform some pre-processing to determine how to route a document.
Three suggestions for the Azure Form Recognizer team:
- Allow one-time use keys or shared access signature keys for direct submission. The service currently only allows usage of a secret APIM key which means that you have to build a server side handle to merely inject this key. This server side handler becomes more or less a proxy as all it’s really doing is injecting the key. Instead, allow issuance of one-time use keys or shared access signature keys like Azure Storage.
- Add support for forwarding the Analyze Layout results to other endpoints. Requiring polling to retrieve the results from the Analyze Layouts service call seems so…un-Azure like. The service should allow configuration of forwarding targets and then when the service call is made, allow the caller to specify a forwarding endpoint. This would provide far more fluid data flows and the ability to push the results into Functions or Logic Apps instead of polling for the results.
- Include more default geometry information in the result. The Analyze Layout service call result currently includes the bounding box information as an array of integer values which represent x and y coordinates of the bounding box. However, this means that all clients need to perform redundant calculations to make use of even basic information like the area, width, and height. Area and height, in particular, are key pieces of metadata which can be used to determine the size of the text which — intuitively — is a signal for the significance of the text. Instead of forcing all consuming clients to calculate this information, it should just be included as part of the result.