Friday, August 6, 2010
Intelligent Character Recognition
A primer on the various methods of ICR--their benefits and pitfalls
The term Intelligent Character Recogition (ICR) encompasses various technologies aimed at the analysis ad recognition of handwritten characters from electronic images. The problem could be stated as follows: Given a digitized image, how should an ICR algorithm analyze its content, recognize the identity of any characters) contained in the image and return this information? For these purposes, image content could consist of an alpha character (a,b,c...), a numeric character (0,1,2...), a special character ($, %, &...) or a "reject", meaning that the algorithm was unable to identify the image as a particular character.
A digitized image is, after all, just a collection of numbers. For binary images, every point or pixel is assigned a value of either 0 or 1; for gray level images pixel values range from 0 to 255, and for color images pixel values usually consist of three numbers, each in the range of 0 to 255. While the ensuing discussion is valid for any type of image, for the sake of simplicity, only binary images will be addressed.
Recognition technologies may be classified as statistical, semantic and hybrid. In the following, these methodologies are reviewed and their advantages and weaknesses compared. Only handwritten numeric characters or digits will be considered, such that the character recognition algorithms return only the values 0,1,2...9 or "reject".
The Statistical Approach
Since every electronic image of a digit consists of pixel values that are represented by a spatial configuration of "0"s and "1"s, a statistical approach to image character recognition would suggest that one look for a typical spatial distribution of the pixel values that characterize each digit. In general, one is searching for the statistical characteristics of various digits. These characteristics could be very simple, like the ratio of black pixels to white pixels, or more complex, like higher order statistical parameters such as the third moments of the image.
Typically, an image of the digit "4111 will have relatively fewer black pixels than an image of the digit "8". In the following illustration, there are almost twice as many black pixels in the "8" image as in the "l", though both are drawn to the same scale.
Continuing the same approach, cursory analysis shows that the ratio of height to width for the digit "0" is less than the same ratio for the digit "6":
More advanced algorithms are usually based on the one-dimensional histograms that can be extracted from digitized images. Such an approach is carried out by producing a histogram that reflects graphically the number of black pixels in each line and in each row. By projecting the black pixel count horizontally and vertically, it is possible to differentiate between many typical cases of digits. Such a projection is demonstrated in the following figure.
In short, by careful analysis of the histograms of various digits, it is possible to differentiate between them.
Thus, the general flow of statistic based character recognition algorithms is as follows:
1. Compute the relevant statistics for a digitized image
2. Compare the statistics to those from a predefined database.
In general, most statistical methods of character recognition work well for digits that do not vary much from an "ideal" or predefined digit. Unfortunately, in reality, handwritten images demonstrate a large variance. Thus, some additional approaches are required to solve the character recognition problem.
The Semantic Approach
Digitized images of handwritten characters indeed consist of pixels. However, a fact that most statistical methods ignore is that the pixels also form lines and contours. This is the essential point of the semantic approaches to character recognition: first recognize the way in which the contours of the digits are reflected in the pixels that represent them and then try to find typical characteristics or relationships for each digit. As is seen in the following examples, this is also the main advantage of semantic methods versus statistical ones. Consider the following case:
The steps of a semantic based classifier for character recognition are as follows:
Find the starting point of a contour.
Start tracing the contour.
Identify the characteristics of the contour while tracing it: 96 UP ", "down", "diagonal up", "arc", "loop", etc.
Search the database for a description similar to the one obtained. Technically, this would be executed by representing the descriptions as a logic tree (graph) and then by matching the graph against the graphs contained in the database.
The following illustration consists of several exemplars of the digit "2". Though the images of the digits exhibit substantial differences on the basis of a pixel-by-pixel comparison, the semantic description of the two left-- most "2"s, for example, is identical.
Since there are not an excessive number of ways to write all possible descriptions of every digit in the manner described and demonstrated above, it is possible to prepare a database that includes several hundred descriptions and encompasses the vast majority of possible cases.