The following list contains all the selectable output formats of the converters:
- Text - This converter writes the recognized text into a simple text file that can be read by most text editors and word processors.
- Comma Separated Text - This converter writes the recognized text into a tabled text file (Comma delimited text file) that can be read by Excel. “List Separator” separates the cells and NL (new line character) separates the lines of the table.
- Formatted Text - This converter writes the recognized text into a text file, but tries to retain the layout of the page by inserting extra spaces.
- Text with linebreaks - The same as Text converter, but this converter inserts line breaks at the end of lines instead of only inserting them at the end of the paragraphs.
- Unicode Text - Same as Text, but using two-byte Unicode characters.
- Unicode Formatted Text - Same as Formatted Text, but using two-byte Unicode characters.
- Unicode Text with linebreaks - Same as Text with linebreaks, but using two-byte Unicode characters.
- HTML 3.2 - The HTML 3.2 format is a clear, small but useable HTML format, this format is supported by ‘all’ HTML interpreters (contrary to HTML 4.0.).
- HTML 4.0 - The HTML 4.0 format is not so clear as HTML 3.2, but Cascading Style Sheet (CSS) technology can be used for box-like absolute positioned objects, styles and manipulating all paragraph and character attributes.
- ePub, ePub Simple, and ePub Poem - ePub e-book converters.
|
Formatted output in ePub format is not supported by the 64-bit SDK. |
- XML - An XML file format conforming to the Nuance XML schema, ssdoc-schema3.xsd, distributed in the ImageGear .NET installation's Bin directory. It contains almost all layout related information and paragraph and character attributes. The page XML output format contains a general description of this format.
- WordPerfect 10 – WordPerfect binary file format for WordPerfect 9 and up.
- Microsoft Reader - Converter for Microsoft Reader ebook format (.lit files).
- Microsoft Infopath - A Microsoft Infopath converter. It supports the saving of various recognized form elements like checkboxes and input lines.
- WordPad - An RTF-based converter that generates a plain and simple RTF file, which can be interpreted by Microsoft Wordpad (and other simple RTF readers).
- RTF Word 6.0/95 - Rich Text Format converter based on the version 1.3 of the RTF Specification. The generated files could be interpreted by almost all RTF readers. The downside is that the size of the output files could be considerably larger than those generated by later RTF converters.
- RTF Word 97 - This RTF converter uses some new features that can only be interpreted by Microsoft Word 97 and up (or by readers with compatible capabilities).
- Microsoft Powerpoint 97 - An RTF-based converter that generates a plain and simple RTF file, which can be interpreted by Microsoft Powerpoint.
- Microsoft Excel 97 - Generates Microsoft Excel 97 binary files (.xls).
- Microsoft Word 97 - The same as RTF Word 97.
- Microsoft Publisher 98 - An RTF-based converter that generates a plain and simple RTF file, which can be interpreted by Microsoft Publisher.
- RTF Word 2000 - Similar to RTF Word 97 converter, but using new features only available in Microsoft Word 2000 and up.
- RTF 2000 Exact Word - This converter is based on RTF Word 2000. It loads the resulting file into Microsoft Word, and tries to correct the pagination errors by slight modifications to spacing values.
- Microsoft Word 2000, XP - The same as RTF Word 2000.
- Microsoft Word WordML - A converter for the XML-based file format of Microsoft Word 2003. Its features, capabilities and layout retention quality are practically the same as in the RTF Word 2000 converter.
- Microsoft Excel 2003, XP - This converter is currently the same as Microsoft Excel 97.
Office 2007 Support
ImageGear Recognition can generate output for the new Office 2007 file types DOCX, XLSX and PPTX. These files can be opened only by programs in Office 2007, or by Office 2003 with an add-in supplied by Microsoft.
The DOCX philosophy is using a set of separate XML, picture and font files, all compressed into one ZIP-like package file. The real document content is housed in a set of XML files, but there are other XML files that define the connections between the content files and the other files. This allows DOCX file sizes to be typically much smaller than a corresponding DOC file.
This DOCX file type specification can download from: http://www.ecma-international.org/news/TC45_current_work/TC45_available_docs.htm
The DOCX/XPSX/PPTX file types conform with a new Microsoft standard called “Open Packaging Conventions (OPC)” whose specification is available for download at http://go.microsoft.com/fwlink/?linkID=71255