Search CORE

13 research outputs found

A post processing system for global correction of Ocr generated errors

Author: Bullard Bryan Edward
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/1992
Field of study

This thesis discusses the design and implementation of an OCR post processing system. The system is used to perform automatic spelling detection and correction on noisy, OCR generated text. Unlike previous post processing systems, this system works in conjunction with an inverted file database system. The initial results obtained from post processing 10,000 pages of OCR\u27ed text are encouraging. These results indicate that the use of global and local document information extracted from the inverted file system can be effectively used to correct OCR generated spelling errors

University of Nevada, Las Vegas Repository

Robust extraction of text from camera images using colour and spatial information simultaneously

Author: Chanda Bhabatosh
Chowdhury Shyama Prosad
Das Amit Kumar
Dhar Soumyadeep
Rafferty Karen
Publication venue
Publication date: 01/01/2009
Field of study

The importance and use of text extraction from camera based coloured scene images is rapidly increasing with time. Text within a camera grabbed image can contain a huge amount of meta data about that scene. Such meta data can be useful for identification, indexing and retrieval purposes. While the segmentation and recognition of text from document images is quite successful, detection of coloured scene text is a new challenge for all camera based images. Common problems for text extraction from camera based images are the lack of prior knowledge of any kind of text features such as colour, font, size and orientation as well as the location of the probable text regions. In this paper, we document the development of a fully automatic and extremely robust text segmentation technique that can be used for any type of camera grabbed frame be it single image or video. A new algorithm is proposed which can overcome the current problems of text segmentation. The algorithm exploits text appearance in terms of colour and spatial distribution. When the new text extraction technique was tested on a variety of camera based images it was found to out perform existing techniques (or something similar). The proposed technique also overcomes any problems that can arise due to an unconstraint complex background. The novelty in the works arises from the fact that this is the first time that colour and spatial information are used simultaneously for the purpose of text extraction

Queen's University Belfast Research Portal

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

ARPHA OAI-PMH Endpoint

ARPHA Preprints

Ottoman archives explorer: A retrieval system for digital Ottoman archives

Author: Altingovde I.S.
Güdükbay U.
Ulusoy Ö.
Yalniz I.Z.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

This article presents Ottoman Archives Explorer, a Content-Based Retrieval (CBR) system based on character recognition for printed and handwritten historical documents. Several methods for character segmentation and recognition stages are investigated. In particular, sliding-window and histogram segmentation methods are coupled with recognition approaches using spatial features, neural networks, and a graph-based model. The prototype system provides CBR of document images using both example-based queries and a virtual keyboard to construct query words. © 2009 ACM

Bilkent University Institutional Repository

Chinese information processing

Author: Liu Yucheng
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/1995
Field of study

A survey of the field of Chinese information processing is provided. It covers the following areas: the Chinese writing system, several popular Chinese encoding schemes and code conversions, Chinese keyboard entry methods, Chinese fonts, Chinese operating systems, basic Chinese computing techniques and applications

University of Nevada, Las Vegas Repository

Document preprocessing and fuzzy unsupervised character classification

Author: Chen Shy-Shyan
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/1995
Field of study

This dissertation presents document preprocessing and fuzzy unsupervised character classification for automatically reading daily-received office documents that have complex layout structures, such as multiple columns and mixed-mode contents of texts, graphics and half-tone pictures. First, the block segmentation algorithm is performed based on a simple two-step run-length smoothing to decompose a document into single-mode blocks. Next, the block classification is performed based on the clustering rules to classify each block into one of the types such as text, horizontal or vertical lines, graphics, and pictures. The mean white-to-black transition is shown as an invariance for textual blocks, and is useful for block discrimination. A fuzzy model for unsupervised character classification is designed to improve the robustness, correctness, and speed of the character recognition system. The classification procedures are divided into two stages. The first stage separates the characters into seven typographical categories based on word structures of a text line. The second stage uses pattern matching to classify the characters in each category into a set of fuzzy prototypes based on a nonlinear weighted similarity function. A fuzzy model of unsupervised character classification, which is more natural in the representation of prototypes for character matching, is defined and the weighted fuzzy similarity measure is explored. The characteristics of the fuzzy model are discussed and used in speeding up the classification process. After classification, the character recognition procedure is simply applied on the limited versions of the fuzzy prototypes. To avoid information loss and extra distortion, an topography-based approach is proposed to apply directly on the fuzzy prototypes to extract the skeletons. First, a convolution by a bell-shaped function is performed to obtain a smooth surface. Second, the ridge points are extracted by rule-based topographic analysis of the structure. Third, a membership function is assigned to ridge points with values indicating the degrees of membership with respect to the skeleton of an object. Finally, the significant ridge points are linked to form strokes of skeleton, and the clues of eigenvalue variation are used to deal with degradation and preserve connectivity. Experimental results show that our algorithm can reduce the deformation of junction points and correctly extract the whole skeleton although a character is broken into pieces. For some characters merged together, the breaking candidates can be easily located by searching for the saddle points. A pruning algorithm is then applied on each breaking position. At last, a multiple context confirmation can be applied to increase the reliability of breaking hypotheses

Digital Commons @ New Jersey Institute of Technology (NJIT)

Recommended from our members

NBS monograph

Author: Stevens Mary Elizabeth
United States. Bureau of Standards.
Publication venue: United States. Government Printing Office.
Publication date: 01/03/1970
Field of study

From Introduction: "This report is the first of a series intended to provide a selective overview of research and development efforts and requirements in the somewhat overlapping fields of the computer and information sciences and technologies. The projected series of reports will attempt to outline the probable range of R & D activities in the computer and information sciences and technologies through selective reviews of the literature and to develop a reasonable consensus with respect to the opinions of workers in these and potentially related fields as to areas of continuing R & D concern for research program planning or review in these areas.

UNT Digital Library

Optical image scanners and character recognition devices : a survey and new taxonomy

Author
Publication venue: Alfred P. Sloan School of Management, Massachusetts Institute of Technology
Publication date: 01/01/1989
Field of study

Includes bibliographical references (p. [54]-[56]).Amar Gupta ... [et al.]

DSpace@MIT

Feature Extraction Methods for Character Recognition

Author: Yampolskiy Roman V
Publication venue: RIT Scholar Works
Publication date: 01/01/2004
Field of study

Not Include

RIT Scholar Works

A study on creating a custom South Sotho spellchecking and correcting software desktop application

Author: Grobbelaar Leon A.
Publication venue: [Bloemfontein] : Central University of Technology, Free State
Publication date: 01/01/2007
Field of study

Thesis (B. Tech.) - Central University of Technology, Free State, 200

Central University Of Technology Free State - LibraryCUT, South Africa