Accusoft.FormAssist v6.0 Application - Updated
Glossary
User Guide > Glossary

A

Accuracy rate
The percent of the text which was recognized correctly.

Alternate results
In many situations, more than one possible result will be returned by the recognition algorithms, each with an associated confidence setting. The initial result is called the 'result', but any additional results are called 'alternate results', and they have a confidence setting lower than the original returned result.

Auto Classification
A type of field that uses FormFix’s FieldTypeClassificationProcessor to automatically classify a field on an input form image as containing either ICR content or OCR content. Typically, the result of this classification will determine whether the SmartZone ICR or OCR recognition engine will be used to extract text from the field.

B

Binarize
The process of creating a bitonal image from a color or grayscale image.

Bitonal
1 bit per pixel black and white image.

BPP
Bits Per Pixel

C

Clip
An image of a field after FormFix processing clips the image from a filled form, with the form optionally dropped out.

Combs
Combs are the vertical bar which delineate individual characters in a text entry field on a form.

Confidence (Auto Classifier)
When an Auto Classifier field is processed, the FormFix FieldTypeClassificationProcessor engine analyzes the image of the field and produces three values which can range from 0 to 100. Each value represents the engine’s confidence that the field contains a certain type of content. Either ICR, OCR, or None. For example, an ICR and OCR confidence of 0 and None confidence of 100 would mean that the engine is certain the field does not contain either ICR or OCR. An ICR confidence of 100 would mean it is certain that it does contain ICR. The highest confidence value determines which field type the field is classified.

D

Deskew
A process to adjust the angle of an image so that it is straight (no longer skewed).

Despeckle
A process that removes specks from a scanned image.

Drop-out
A process whereby the pre-printed content on an image is removed, leaving only the data that was added to a form. When filled and template data overlap, the filled data will be reconstructed as accurately as possible.

E

Edit Distance
The minimum number of insertions, deletions, and replacements of characters needed to change the recognized text to the correct text.

Electronic Document
A document that has been scanned, or was originally created on a computer.

F

False negative
When a filled-in image is not recognized as a form which it should match.

False positive
When a filled-in image is recognized as a form when in reality it does not match that form or when it more closely matches a different form.

Field

A single rectangular region on a form, along with a type and various other attributes. Frequently, a field will wholly contain a single user-filled item, such as a last name or phone number. The FormFix component has two functions related to fields.

FieldTypeClassificationProcessor (Auto Classifier)
The FormFix FieldTypeClassificationProcessor is what powers the Auto Classification feature within FormAssist. It takes as input, an image of a form and a rectangle area representing the field to be analyzed. It outputs its confidence that the field is ICR, its confidence that the field is OCR, and its confidence that the field is neither ICR nor OCR.

Filled data
Data, either hand-written or machine-printed, that is added to a form. Reading filled data is usually the primary goal of a forms-processing system.

Filled image
An image containing both a form template and filled data. Filled images are the principle input to a forms-processing system. When the distinction is important, a filled image will usually be called an 'unknown image' prior to form identification and a 'filled image' after form identification.

Form
As used by the Accusoft FormFix SDK, a single template image, along with various attributes and properties of the form. Forms contain zero or more fields and are part of a single form set. A course of confusion is the possibility that a client will define a form as a booklet or both sides of a single sheet. Within this help file, 'form' will always refer to a single side of a single sheet. Although not the intended purpose, it is possible for a client to place multiple logical form pages into a single form. For example, a 4-page, 8.5 x 11 form could be unfolded and scanned at 11 x 17 as two forms.

Form definition file
As used by the Accusoft FormFix SDK, a file, read and written exclusively by the file component, which defines a single form and all of its fields. Form definition files will be wholly contained, including a template image, with no references to external files.

Form document
A customer-defined, logical document comprised of one or more related sheets, with information on one or both sides of each sheet. This term is defined in this specification to conveniently describe common customer needs. Some examples would be a one-sheet, front and back credit application, a 6-page front-only mortgage application, a 32-page student test booklet, or a one-page HCFA 1500 medical form. The FormFix component will have no explicit support for form documents.

Form set
A collection of zero or more forms. The FormFix component used by FormAssist will have explicit support for form sets, in order to support identification of a form within the set.

Form set file
A file, read and written exclusively by the file component, which defines a set of forms. A form set file includes various attributes related to the form set, but is primarily a set of references to form definition files, which are separate from the form set file.

Forms Processing
An imaging application for handling printed forms. Forms processing systems often use OCR engines and data validation routines to extract hand-written or printed information from forms that go into a database.

I

ICR (Intelligent Character Recognition)
Reading hand written text from paper and translating the images into a form that the computer can manipulate.

N

Noise
Irrelevant or meaningless data

O

OCR (Optical Character Recognition)
Reading text from paper and translating the images into a form that the computer can manipulate.

OMR (Optical Mark Recognition)
Reading marks out of a series of OMR bubbles.

OMR bubble
A circle, oval, square, or rectangle used on business forms to delineate areas that are to be hand filled or marked to indicate a choice.

OMR bubble value
A string associated with a bubble. If the bubble is determined to be marked, this string is then the result value of the segment that bubble resides in.

OMR mark-box
A specific type of field that contains a single bubble. The concept of orientation is meaningless for a mark-box.

OMR marked threshold
A mark recognition image pixel density user settable adjustment value that specifies a minimum normalized pixel density threshold for determining a marked bubble.

OMR multi-mark
Default behavior for OMR field processing is a maximum of one bubble per segment is expected to be marked. A multi-mark field expects that more than one bubble may be marked per segment and processes accordingly but at the cost of reduced recognition accuracy.

OMR multi-segment field
A field that contains a grouping of multiple related segments in order to facilitate recognition processing and the retrieval of result values in a grouped manner. An example of a multi-segment field on an OMR form would be a Social Security Number field, with a segment comprising each digit. Recognition result values can be retrieved concatenated on a per field basis or one by one on a per segment basis.

OMR Optical mark recognition
The process of capturing data by contrasting pixel densities at pre-determined positions on a form. In the context of Accusoft's forms processing, it specifically refers to the process of discriminating between marked and unmarked circular, oval, square, or rectangular bubble or bubble's typically arranged in a row and column grid on a form.

OMR Orientation
Relationship of a segment or segment's in a bubble grid to the top edge of a form image. If the bubbles of a segment run parallel to the top edge, the orientation is Horizontal. If the bubbles of a segment run perpendicular to the top edge, the orientation is Vertical. For multi-segment fields, orientation also provides a concept of a result order. by default, the result order for a horizontally oriented field is left to right. However, orientation can be specified as Horizontal with a result order of right to left. By default, the result order for a vertically oriented field is top to bottom. However, orientation can be specified as vertical with a result order of bottom to top.

OMR Segment
A set of one or more bubbles that after recognition provides a discrete result value, it is the base unit of OMR recognition in FormFix. An example of segments on an OMR form would be the individual digit selection bubble groups of a Social Security number field. A segment's bubbles in a non-Multi mark field are expected to be marked in a mutually exclusive manner. Initial FormFix releases may limit a segment to spanning either a single bubble row or column.

OMR Unmarked threshold
A mark recognition image pixel density user configurable adjustment value that specifies a maximum normalized pixel density threshold for determining that a bubble is unmarked.

OMR unmarked value
A string associated with a field that specifies the result value for any segments in the field determined to have no bubbles marked.

P

Persistent data
Data that is created by the FormFix component and stored for use by a future process. Persistent data should also be made available to other machines, if they are also performing the same operation. Persistent data is always tied to a specific form or form set.

S

Semi-structured form
A form, whose layout changes from instance to instance or whose layout is not precisely known in advance. Examples are a set of invoices from unknown vendors and a set of phone bills from a single vendor (with each having a different number of pages).

Structured form
A form, whose layout does not change from instance to instance and whose layout is known in advance. Structured forms are usually developed by the same organization that must process them or are printed to meet a formally-defined standard. Examples are credit card applications, 2000 census forms, tax forms, and medical forms.

Successful registration
Production of transform parameters which result in registration of all parts of an image to a specific tolerance.

T

Template
A full image of the original form, without any additional data added to it. Also called a "blank form", a template comprises all the content that is common to all images of a given form, and only the content that is common to all images of a given form.

U

UI (User Interface)
The controls or API provided for user interaction with an application or component.

Unicode
A character set that can support a wide range of international characters. Unicode requires 16-bits to encode a character, unlike ASCII, which requires only 8 but supports only a small subset of characters beyond latin.

Unknown image
An image containing a form of unknown identification. During processing, all images of forms are referred to as unknown images, until after the image has been recognized as a form.

Unstructured form
A form, whose layout changes dramatically from page to page or whose layout is unknown in advance. Example would be a set of corporate annual reports and a set of magazine advertisements.

Is this page helpful?
Yes No
Thanks for your feedback.