FormFix v6.0 for .NET - Updated
Glossary
Overview > Concepts > Glossary

A

Accuracy Rate
The percent of the text which was recognized correctly. It will be stated in percent and never be greater than 100%.

Alternate Results
When more than one possible result can be returned by the recognition algorithms, each with an associated confidence setting. The initial result is called the result, and any additional character results are called the alternate results. They generally have confidence settings lower than the initial result.

Auto classification
Process to automatically determine if the field input is OCR or ICR using computer vision algorithms.

B

Blob
An isolated connected group of pixels.

C

Comb
Combs are the vertical bars which delineate individual characters in a text entry field on a form.

D

Drop-Out
A process whereby the pre-printed content on an image is removed, leaving only the data that was added to a form. When filled and template data overlap, the filled data will be reconstructed as accurately as possible.

Drop-Out Bulb
A hardware scanner-based technique to drop out a form. For example, if your paper is white, you have printed a form in red, and data has been entered on the form in some other color (often black or blue), you can use a red bulb (instead of white) in the scanner to physically drop the form, leaving only the filled data. FormFix Virtual Bulb feature attempts to do the same thing by starting with a color image and simulating the effect of scanning with a red bulb.

E

Edit Distance
The minimum number of insertions, deletions, and replacements of characters needed to change the recognized text to the correct text.

Extraction
A synonym for form drop-out. FormFix v2 uses this term instead of drop-out. FormFix v3 will discontinue use of the term, extraction. Instead, it will refer to the process as drop-out.

Electronic Document
A document that has been scanned, or was originally created on a computer.

F

Field
A single rectangular region on a form defined by pixels, along with a type and various other attributes. Frequently, a field will wholly contain a single user-filled item, such as a last name or phone number.

Field Clip
An image of a field.

Filled Data
Data, either hand-written or machine-printed, that is added to a form. Reading filled data is usually the primary goal of a forms-processing system.

Form or Form Definition
A single template image, along with various attributes and properties of the form. Forms contain zero or more fields and are part of a single form set. Form will generally refer to a single side of a single sheet although it is possible to have multiple logical form pages on a single form. For example, a 4-page, 8.5 x 11 form could be unfolded and scanned at 11 x 17 as two forms.

Form Document
A customer-defined, logical document comprised of one or more related sheets, with information on one or both sides of each sheet. Some examples would be a one-sheet, front and back credit application, a six-page front-only mortgage application or a 32-page student test booklet. The FormFix component will have no explicit support for form documents.

Form Definition File
A file which defines a single form and all of it's fields read and written by this component. The form definition files will be wholly contained, including a template image with no references to external files. The extension of .frd will be used for all files of this type.

Form Identification
Form Identification means to choose between a set of possible form templates (unfilled form images) when trying to figure out what a filled form would look like were it not filled.

Form Set
A collection of zero or more forms. The FormFix component will have explicit support for form sets, in order to support identification of a form within the set.

Form Set File
A file which defines a collection of form sets and/or form definitions including all the data that defines them. The extension of .frs will be used for all files of this type.

Forms Processing System
An imaging application for handling printed forms. Forms processing systems often use OCR engines and data validation routines to extract hand-written or printed information from forms that go into a database.

I

ICR (Intelligent Character Recognition)
Machine recognition of hand-printed characters as well as machine printing that is difficult to recognize.

M

MICR (Magnetic Ink Character text Recognition)
It consists of 14 characters. The ten digits and four control characters defined as A, B, C, and D. It is used mostly in the banking industry. It is the numbers on checks that describe the routing and account information.

N

Noise
Irrelevant or meaningless data

O

OCR (Optical Character Recognition)
Reading text from paper and translating the images into a form that the computer can manipulate.

OMR (Optical Mark Recognition)
The process of capturing data by contrasting pixel densities at predetermined positions on a form. In the context of Accusoft document processing, it specifically refers to the process of discriminating between marked and unmarked circular, oval, square or rectangular bubble or bubble's typically arranged in a row and column grid on a form. The technology of electronically extracting intended data from marked fields, such as check-boxes and fill-infields, on printed forms.

OMR Bubble
A circle, oval, square, or rectangle used on a business forms to delineate areas that are to be hand filled or marked to indicate a choice.

OMR Bubble Value
A string associated with a bubble. If the bubble is determined to be marked, this string is then the result value of the segment that bubble resides in.

OMR Mark-box
A specific type of field that contains a single bubble. The concept of orientation is meaningless for a Mark-box.

OMR Marked Threshold
A mark recognition image pixel density user settable adjustment value that specifies a minimum pixel density threshold for determining a recognized marked bubble’s (or bubbles’) confidence value(s).

OMR Multi-Mark
A multi-mark field expects that more than one bubble may be marked per segment and processes accordingly but at the cost of reduced recognition accuracy. The default behavior for OMR field processing is that a maximum of one bubble per segment is expected to be marked.

OMR Multi-Segment Field
A field that contains a grouping of multiple related segments in order to facilitate recognition processing and the retrieval of result values in a grouped manner. An example of a Multi-segment field on an OMR form would be a Social Security number field, with a segment comprising each digit Recognition result values can be retrieved concatenated on a per field basis or one by one on a per segment basis

OMR Orientation
Relationship of a segment or segments in a bubble grid to the top edge of a form image. If the bubbles of a segment run parallel to the top edge, the orientation is Horizontal. If the bubbles of a segment run perpendicular to the top edge, the orientation is Vertical. For multi-segment fields, orientation also provides a concept of a result order. By default, the result order for a horizontally oriented field is left to right. However, orientation can be specified as Horizontal with a result order of right to left. By default, the result order for a vertically oriented field is top to bottom. However, orientation can be specified as Vertical with a result order of bottom to top.

OMR Segment
A set of one or more bubbles that after recognition provides a discrete result value, it is the base unit of OMR recognition in FormFix. An example of segments on an OMR form would be the individual digit selection bubble groups of a Social Security number field. A segment’s bubbles in a non-Multi mark field are expected to be marked in a mutually exclusive manner. Initial FormFix releases may limit a segment to spanning either a single bubble row or column.

OMR Unmarked Threshold
A mark recognition image pixel density user settable adjustment value that specifies a maximum pixel density difference threshold for determining that a bubble (or bubbles) may be unmarked.

OMR Unmarked Value
A string associated with a field that specifies the result value for any segments in the field determined to have no bubbles marked.

P

Persistent Data
Data that is created by the FormFix component and stored for use by a future process. Persistent data should also be made available to other components. Persistent data is always tied to a specific form or form set.

R

Registration
To create registration parameters, not necessarily to move the pixels within an image to new, registered, locations.

S

Semi-Structured Form
A form, whose layout changes from instance to instance or whose layout is not precisely known in advance. Examples are a set of invoices from unknown vendors and a set of phone bills from a single vendor (with each having a different number of pages).

Structured Form
A form, whose layout does not change from instance to instance and whose layout is known in advance. Structured forms are usually developed by the same organization who must process them or are printed to meet a formally-defined standard. Examples are credit card applications, 2000 census forms, tax forms, and medical forms.

Successful Registration
Production of transform parameters which result in registration of all parts of an image to a specific tolerance.For testing purposes, you can only measure successful registration on certain images. Those images are ones that can be brought into precise registration (within 2 pixels) to a template using an affine transform. Common examples that would not meet that definition are images that are unevenly distorted due to scanning or printing and images that are curled.

T

Template or Template Image
A full image of the original form, without any additional data added to it. Also called a "blank form", a template comprises all the content that is common to all images of a given form, and only the content that is common to all images of a given form.

Thread-Safe
An application using this component can use multiple controls running in the same process, where they don't interfere with each other. The thread creating the control owns the control and is the only thread that can interact with it.

U

Unknown Image
An image containing a form of unknown identification. During processing, all images of forms are referred to as unknown images, until after the image has been recognized as a form.

Unicode
A character set that can support a wide range of international characters. Unicode requires 16-bits to encode a character, unlike ASCII, which requires only 8 but supports only a small subset of characters beyond latin.

V

Virtual Bulb
Virtual Bulb is a FormFix feature which refers back to an old scanner-based technique to drop out a form (see Drop-Out Bulb). The FormFix Virtual Bulb feature attempts to do drop-out based on the color of the form template, by starting with a color image and simulating the effect of scanning with a bulb whose color matches the template.

Is this page helpful?
Yes No
Thanks for your feedback.