OmniPage Pro - The OCR standard
Author: Casey Meaden
Date: December, 1993
Keywords: Caere software program review application OCR optical character recognition scanner scanning
Text: Well, I don't know about you, but I've had too much computer. It's given me tendonitis in my right arm from too much typing and mouse-ing around. So I thought that an OCR (Optical Character Recog-nition) program might be a big help in minimizing the typing I need to do in putting out our Macintosh User Group newsletter and posting news to our MUG BBS. Thank's to Caere's OmniPage Pro I'm getting the help I need in eliminating EKS (excessive keystrokes.) A scanned page of text creates a bitmapped pixel by pixel representation of the original image, but you're stuck with a raggedy bitmapped representation of the original. What if you want to change a few words or the font? Unless, you retype it you're stuck. To address this need a few companies have applied the power of computers to intelligently interpret the scanned patterns as text. In the past, text recognition consisted of comparing a bit-map's shape to a library of character shapes. Unfortunately, that method only worked for a limited number of fonts and sizes; text characters vary considerably, and it would take a gigantic computer to correctly match all the possible shapes. In 1988 Caere released OmniPage Pro (OPP) which uses OCR technology that analyzes individual character features (feature-analysis) rather than matching shapes (matrix-matching). The speed with which a page is recognized is dependent upon the type and speed of the computer's processor, the quality of the scanned image, any graphical elements, and the variety of fonts included on the scanned page. Many scanners can create the bitmapped images for your computer to work with but flatbed scanners work best. OPP has printer drivers for most scanners. I tried a scan with my Apple OneScanner of an advertisement after reading only the 16 page Quick Manual that comes with OPP just to get a feel for how intuitive the program is. I had no trouble at all; this is a very good Mac interface. The ad has a simple graphic and short amount of text (see page 9 of this newsletter.) This kind of work is one of my main intended uses for this program...to help our MUG get special offers out to our members. My first try came off without a hitch. Some of the very fine print at the bottom needed a little human help, but OCR often will have a few words that the program needs help with. OPP has a wonderful training feature to teach it unusual characters. OPP is '''omnifont'' - it recognizes virtually all nonstylized fonts in sizes ranging from 8 to 72 points without training or programming. In addition, OPP accelerates when reading a consistent font, achieving speeds in excess of 100 characters per second (about 25 seconds per typical typed page). During page recognition, OmniPage displays sample characters from the scanned document in a character window. If OPP can't ID a character it will put a tilde (~) in the text. (See illustration.) Using the spelling checker, OPP will lead you through the file and locate any questionable characters. Unlike most spelling checkers it suggests alternative spellings based on the most likely character recognition errors rather than the usual human error, keyboard entry mistakes. By option-double clicking on any word the program will show you the original bitmap of that portion of the scanned original to help you in text verification. This is one of the better features of the program. Once scanned and recognized, text may be edited within the OmniPage transitional editor prior to transferring it to an application file. Text may be saved as ASCII (caution, any graphics will be lost), Microsoft Word, Microsoft Word RTF, WordPerfect, MacWrite and MacWrite II, Excel Text, or Caere Meta (which allows further editing from the scanned information.) OPP supports Apple's "publish and subscribe" feature. One problem I had was that documentswith multiple columns of text, numbers or graphics seemed to place the scanned blocks sequentially after recognition. Mr. Mike Lough of Caere said that this was being addressed in the next version of the program with what he called "True Page Processing," which will maintain the layout of the original page after recognition. This new version (5.0) will probably be available in the first quarter of '94. The new version will also have more automatic features making it easier to use. It will also increase accuracy by using 3-D OCR; using grey-scale information from the scanner, which will help with the recognition of degraded documents. OmniPage Pro also comes with a complete graphics scanning and editing program called The Graphic Editor. This image scanning and editing program is more than adequate to the needs of most desktop publishers. I asked Caere's technical support if they had any plans to support 600 dpi image printing since the newer, low cost laser printers now have this technology. They said no, but that the Graphic Editor does support 400 dpi printing, though it's not stated in the manual. I think one of the more interesting new uses for OCR will be in scanning received faxes and putting the text and images into an editable file format. This could be extremely useful for company reps away from the office who quickly need to get changing facts and figures into a client presentation. Faxing should be at the 200 x 200 resolution with text of 10 pts. or larger for best recognition. Line art comes through very acceptably at this resolution.
OmniPage is currently available in three versions: a software-only version that runs on Macintosh II; and two DOS versions that run under Microsoft Windows. The Macintosh version was first shipped in September 1988, the MS-DOS version shortly after. There are two less feature laden versions of OmniPage Professional; OmniPage 3.0 and OmniPage Direct. It would be worth looking into just which features you need if you are on a budget.
The program needs: * Mac II or higher, color, B&W or greyscale, * Hard disk with at least 4 MB available space, * Total RAM of at least 5 MB, 8 is recommended, * The new version of OPP (v2.21.2) needs System 7 or higher and virtual memory is required, * B&W or color scanner recognized by OPP. This includes most flat bed scanners (with more drivers on the way.) The program comes on two HD Installer disks which were very straightforward to use, but you must have a Super Drive. (Tech support said that OPP was not available on 800K disks. It should be.) The Installer provides for custom installation, allowing users with minimal hard drive space to leave out dictionaries and help files if necessary. Also in the OPP package are the Owner's Manual, a 16 page Quick Manual and the Graphic Editor's Owner's Manual. Eleven foreign language dictionaries are available as well as medical and legal versions. Caere has designed an easy to use, cost effective, accurate means of converting text, numbers and pictures into computer usable form. This OCR program is an attractive alternative to manual data entry, which is slow, tedious, error prone (and makes my arm hurt.) No wonder MacUser gave OPP a rating of 4 1*2 mice. Thanks Caere, I really like this product! Prices on Caere's products have recently dropped by 30% and scanners are also cheaper, making this technology a good investment for anyone with reams of paper to deal with. OPP even works with automatic document feeders. The price varies, so shop around: OmniPage Professional for $499.95 (MacConnection). There are two smaller versions also; OmniPage Direct for $189 (MacConnection) and OmniPage 3.0 for $349.95 (MacWarehouse), which is compatible with System 6.
OmniPage Professional from: Caere Corporation 100 Cooper Court Los Gatos, CA 95030 (408) 395-7000 FAX (408) 354-2743
Copyright © december, 1993 by Casey Meaden