Abstract
A method for the separation of graphics and text from digitized documents for automatic conversion and compression of paper-based information for data base storage is described. First, the text portion of the digitized document image is separated from the graphics by a robust algorithm which classifies text based on the properties of the connected components of the document. General characteristics of text are employed to categorize and remove it from the image through the utilization of image-dependent size filters, Hough domain grouping, and the application of heuristic knowledge of text attributes. Once the text is removed, the graphics portion of the image is converted to a higher-level representation in the form of a list of segments (straight lines, curves) and their corresponding thickness.
© 1987 Optical Society of America
PDF ArticleMore Like This
Stephane Mallat
FD4 Machine Vision (MV) 1987
Sunanda Mitra
THM1 OSA Annual Meeting (FIO) 1987
G. Eichmann and M. Jankowski
FB5 Machine Vision (MV) 1987