Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Mixed text/graphics images: automated text separation and graphics representation

Open Access Open Access

Abstract

A method for the separation of graphics and text from digitized documents for automatic conversion and compression of paper-based information for data base storage is described. First, the text portion of the digitized document image is separated from the graphics by a robust algorithm which classifies text based on the properties of the connected components of the document. General characteristics of text are employed to categorize and remove it from the image through the utilization of image-dependent size filters, Hough domain grouping, and the application of heuristic knowledge of text attributes. Once the text is removed, the graphics portion of the image is converted to a higher-level representation in the form of a list of segments (straight lines, curves) and their corresponding thickness.

© 1987 Optical Society of America

PDF Article
More Like This
An efficient image representation for multiscale analysis

Stephane Mallat
FD4 Machine Vision (MV) 1987

Automated feature recognition from image data intermapping

Sunanda Mitra
THM1 OSA Annual Meeting (FIO) 1987

Surface Representation and Shape Description of Solid Bodies

G. Eichmann and M. Jankowski
FB5 Machine Vision (MV) 1987

Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.