Settings

Documents to OCR: Select which documents to process with OCR in the Index module.

  • All Documents

  • Unclassified Documents Only

  • Only Documents with no Triggered Zone Profile

Pages to OCR: Select which pages to process with OCR in the Index module.

  • First Page of Document Only

  • All Pages of Document (Default)

  • Custom Page List of Document

    When selected, an area to enter the custom pages appears.

    Enter pages and page ranges, separated by a comma (for example: 1, 3, 5-7, 9-END).

Zone to OCR: Perform OCR on the entire page or on a user-defined zone. You can define a zone in the Zone Configuration dialog box by clicking the Define Zones button.

Under OCR Indexing Field Options, the existing index fields that were configured in Indexing are listed in the Index Fields pane, and another pane is available with four tabs:

  • Auto Highlighting

  • Character Filtering

  • Validation

  • Customize

Auto Highlighting

OCR text is highlighted when the appropriate index field is in focus. Expressions can be used to select which type of text strings are highlighted. Click Add or Edit to open the Regular Expression Editor.

Type new expressions in the space provided. Alternatively, you can type in the phrase or series of numbers needed in an expression and click Generate to generate the expression automatically. A message displayed in red denotes an invalid expression, and green denotes a valid expression. Click Select to view commonly used expressions through the Regular Expressions Manager.

Other options available in the auto highlighting area include:

  • Highlight words in Zone: Highlights the OCR result words without populating the index field for user review. You can select the entire page as a zone or define a specific zone to highlight.

  • Highlight Expression Processing: Highlights an entire word, only matching words, or only matching words with a custom format. Custom format is used for word matching. Click the Help button next to the Custom format field to view all options.

Character Filtering

OCR results can be filtered to facilitate indexing. The options available for character filtering are listed below.

The Extended Characters field is only available for the "All Characters" and "Custom" filters.

  • All Characters

  • Alpha Only (a-z, A-Z)

  • Numeric Only (0-9)

  • Numeric Extended (0-9, $%#+-.,)

  • Date (0-9,/-)

  • Extended Characters Only

  • Standard Printable Characters

  • Custom: You may create custom filter expressions using regular expressions.

    This expression is an inverse expression (that is, it removes what you do not want as opposed to adding what you want).

Validation

Validation decides how to deal with invalid characters. There are a few options that are available to correct this issue:

  • Do Not Correct: Does not change the invalid character for the raw OCR results.

  • Remove: Removes the invalid character.

  • Auto Correct: Attempts correction based on the character set selected in the Character Filtering tab. For example, if the OCR engine returns a capital O and the Character Filtering and Extended Characters expect characters from 0-9 and a, b, or c, the character placed in the field would be substituted with a zero (0).

  • Replace with an invalid character marker: Marks all invalid characters with a character that the user determines.

    Replacing invalid characters with a character that is invalid for the data type of that field causes either no data to be returned or errors to occur.

Customize

In the Customize tab, you can customize the highlighting colors for Auto Highlighting.