Troubleshooting tips
The OPOCR component provides detailed logging that may be used in troubleshooting. "Jobxml" is data sent to the OCR engine when a file is sent to be read. The data contains configuration settings that tell the engine how to proceed to do its work. This information is useful for recreating and troubleshooting issues, and for getting fixes. The Jobxml info is now written to logs and can be saved to disk.
- OCR_Last and OCR_Failed output folders
- The OPOCR component creates two folders in the AutoStore
program files folder. They are OCR_Failed and
OCR_Last. Each is disabled by default, and can be enabled
by renaming the subfolder contained within each one from "Disabled" to "Enabled".
Note that this feature may be removed or changed in a later update.
- OCR_Last captures the last job run through OCR, regardless of whether it failed or succeeded. It will only ever contain the last job. It will contain the Jobxml and the actual file that was processed.
- OCR_Failed captures every failed job run through OCR. It can become cluttered with thousands of files, so you want to only enable it when necessary. It will contain the Jobxml and the actual file last processed for every failed job. Each failed job is saved to a unique subfolder.
Known issues
Problem description | Solution |
---|---|
AutoStore jobs fails or hangs |
The RESULTS WAIT TIMEOUT option for the OPOCR component in the configuration (.cfg) file of an AutoStore process controls how long the AutoStore workflow service waits before restarting the OCR engine when it is not responding. By default, this value is set to 60 minutes. You can edit the configuration file and change this interval to a different value. If the interval is too short, a lengthy OPOCR task may be terminated prematurely and result in a failed job. You may want to set the RESULTS WAIT TIMEOUT option to a shorter value if most of your documents are short and you do not want to wait for task processing to continue while rarely processing a long document. For example, if nearly all of your documents require less than 15 minutes, you can set this value to RESULTS WAIT TIMEOUT = 15, and then manually handle a document that takes longer than 15 minutes after OPOCR times out and fails the job. You can activate the OCR_Failed folder to collect OCR jobs that timeout. |
Poor-quality OCR results |
Inaccuracies in the OCR process can have many causes. It is recommended that you perform an analysis of types of paper, scanners, and resolution levels to optimize your OCR results before setting up OCR processes. The following are some common tips for increasing OCR accuracy.
|
When you export file to HTML format, the images are not displayed in the output file. | This problem may appear when you use renaming schema. When HTML is used as the output file format, you get an HTML file and some images to which the HTML file references. If you rename the images then the internal links will be broken. Therefore, the rename schema should not be used when exporting to HTML format. |
Some setting of the output document has a value different from the specified one. | Make sure that this setting was specified correctly. If a setting was defined incorrectly or uses an RRT that was replaced with the incorrect value, the component replaces the incorrect value by the default value at run time, if the default value exists. |
When using the Zoned OCR Matches wildcard validation setting on a zip code with 5 numbers, the validation might fail. | Use [#][#][#][#][#] to validate the zip code. |
When using the Zoned OCR Matches regular expression validation setting on a zip code with 5 numbers, the validation fails if you use multipliers {...}. |
Use the following to validate the zip code: (0|1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9) |
OCR engine may return error 0x8004C60A if the document has empty or bad pages. | Set the Replace corrupted page with blank to process the rest of the document. | to