Accusoft.SmartZoneOCR4.Net
Define and Edit Regular Expressions
See Also Send Feedback
SmartZone OCR 4 for .Net - User Guide > How To > Define and Edit Regular Expressions

Glossary Item Box

SmartZone OCR allows you to use Regular Expressions to provide the format of expected data contents. If you know the expected format(s), providing that information will make your recognition results more accurate. (See Regular Expressions for information on syntax.)

 

Use SetRegularExpression to specify the regular expression string. Note that FieldType must be set to RegularExpression for your regular expression to impact the recognition.

An InvalidRegularExpressionException will be thrown if the regular expression is not valid.

C# Example - Set and get regular expression, and use it in recognition Copy Code
try
{
  …
  //Set FieldType to RegularExpression and specify regular expression string
  mySmartZoneOCR.Reader.FieldType = Accusoft.SmartZoneOCRSdk.FieldType.RegularExpression;

  mySmartZoneOCR.Reader.SetRegularExpression("(\\d{4}){4}");

  //Get the regular expression string previously set
  string myRegex = mySmartZoneOCR.Reader.GetRegularExpression();

  //Perform Recognition
  TextBlockResult textBlockResult = mySmartZoneOCR.Reader.AnalyzeField(imageToRecognize.ToHbitmap(false));
  …
}

catch (Accusoft.SmartZoneOCRSdk.InvalidRegularExpressionException ex)
{
   MessageBox.Show(ex.Message);
}
catch (Accusoft.SmartZoneOCRSdk.BitDepthException ex)
{
   MessageBox.Show(ex.Message);
}

To use regular expression to supplement the recognition of a field type, set the SetFieldRegularExpression  method. Supported field types for regular expression are Currency, CurrencyPlus, Date, Email, SocialSecurityNumber, Time, UnitedStatesPhoneNumber, URL, and RegularExpression. For a regular expression example, consider there is a format for part numbers that consisted of alpha and digits, you can use [[alpha]]{2}\d{4} for AB1234.

 

Use GetFieldRegularExpression to get the regular expression of the field type.

C# Example - Set and Get regular expression of a certain field type Copy Code
…
mySmartZoneOCR.Reader.SetFieldRegularExpression(Accusoft.SmartZoneOCRSdk.FieldType.Date, "\\d{2}\\/\\d{2}\\/\\d{4}");

string dateRegularExpression = mySmartZoneOCR.Reader.GetFieldRegularExpression(Accusoft.SmartZoneOCRSdk.FieldType.Date);}
…
Setting a regular expression does not mean it will be used in the actual recognition. You will need to set the FieldType properly for the corresponding regular expression to be used.
C# Example - regular expression and recognition Copy Code
//Set field regular expression for Currency and Date                 
mySmartZoneOCR.Reader.SetFieldRegularExpression(Accusoft.SmartZoneOCRSdk.FieldType.Date, "\\d{2}\\/\\d{2}\\/\\d{4}");
mySmartZoneOCR.Reader.SetFieldRegularExpression(Accusoft.SmartZoneOCRSdk.FieldType.Currency, "\\$\\d{2}");
          
//Set FieldType to Date, so only the regular expression of the Date field type will be used in this recognition.
mySmartZoneOCR.Reader.FieldType = Accusoft.SmartZoneOCRSdk.FieldType.Date;

TextBlockResult textBlockResult = mySmartZoneOCR.Reader.AnalyzeField(imageToRecognize.ToHbitmap(false));

//Set FieldType to Currency, and the regular expression of the Currency field type will be used in the following recognition process.
mySmartZoneOCR.Reader.FieldType = Accusoft.SmartZoneOCRSdk.FieldType.Currency;

TextBlockResult textBlockResult = mySmartZoneOCR.Reader.AnalyzeField(imageToRecognize.ToHbitmap(false));
Setting the field regular expression for RegularExpression field type is equivalent to setting regular expression directly.
C# Example - The 2 lines below have the same effect Copy Code
// The 2 lines below have the same effect
…
mySmartZoneOCR.Reader.SetRegularExpression("[A-Z]{3}");
mySmartZoneOCR.Reader.SetFieldRegularExpression(Accusoft.SmartZoneOCRSdk.FieldType.RegularExpression, "[A-Z]{3}");

…

 

Use ClearFieldRegularExpression to clear the regular expression of the field type.

C# Example Copy Code
//Clear field regular expression for the Time field type               
……        
mySmartZoneOCR.Reader.ClearFieldRegularExpression(Accusoft.SmartZoneOCRSdk.FieldType.Time);
……

 

Case insensitivity for all the regular expressions is defined by the property RegularExpressionCaseInsensitivity. This property works for all the regular expressions.

C# Example Copy Code
……
// Set all regular expression to be case insensitive
mySmartZoneOCR.Reader.RegularExpressionCaseInsensitivity = true;
……

 

See Also

©2013. Accusoft Corporation. All Rights Reserved.