320 research outputs found
On virtual partitioning of large dictionaries for contextual post-processing to improve character recognition
This paper presents a new approach to the partitioning of large dictionaries by virtual views. The basic idea is that additional knowledge sources of text recognition and text analysis are employed for fast dictionary look-up in order to prune search space through static or dynamic views. The heart of the system is a redundant hashing technique which involves a set of hash functions dealing with noisy input efficiently. Currently, the system is composed of two main system components: the dictionary generator and the dictionary controller. While the dictionary generator initially builds the system by using profiles and source dictionaries, the controller allows the flexible integration of different search heuristics. Results prove that our system achieves a respectable speed-up of dictionary access time
Text Extraction From Natural Scene: Methodology And Application
With the popularity of the Internet and the smart mobile device, there is an increasing demand for the techniques and applications of image/video-based analytics and information retrieval. Most of these applications can benefit from text information extraction in natural scene. However, scene text extraction is a challenging problem to be solved, due to cluttered background of natural scene and multiple patterns of scene text itself. To solve these problems, this dissertation proposes a framework of scene text extraction.
Scene text extraction in our framework is divided into two components, detection and recognition. Scene text detection is to find out the regions containing text from camera captured images/videos. Text layout analysis based on gradient and color analysis is performed to extract candidates of text strings from cluttered background in natural scene. Then text structural analysis is performed to design effective text structural features for distinguishing text from non-text outliers among the candidates of text strings. Scene text recognition is to transform image-based text in detected regions into readable text codes. The most basic and significant step in text recognition is scene text character (STC) prediction, which is multi-class classification among a set of text character categories. We design robust and discriminative feature representations for STC structure, by integrating multiple feature descriptors, coding/pooling schemes, and learning models. Experimental results in benchmark datasets demonstrate the effectiveness and robustness of our proposed framework, which obtains better performance than previously published methods.
Our proposed scene text extraction framework is applied to 4 scenarios, 1) reading print labels in grocery package for hand-held object recognition; 2) combining with car detection to localize license plate in camera captured natural scene image; 3) reading indicative signage for assistant navigation in indoor environments; and 4) combining with object tracking to perform scene text extraction in video-based natural scene. The proposed prototype systems and associated evaluation results show that our framework is able to solve the challenges in real applications
Advances in Character Recognition
This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject
An investigation into the use of linguistic context in cursive script recognition by computer
The automatic recognition of hand-written text has been a goal
for over thirty five years. The highly ambiguous nature of cursive
writing (with high variability between not only different writers, but
even between different samples from the same writer), means that
systems based only on visual information are prone to errors.
It is suggested that the application of linguistic knowledge to
the recognition task may improve recognition accuracy. If a low-level
(pattern recognition based) recogniser produces a candidate lattice
(i.e. a directed graph giving a number of alternatives at each word
position in a sentence), then linguistic knowledge can be used to find
the 'best' path through the lattice.
There are many forms of linguistic knowledge that may be used
to this end. This thesis looks specifically at the use of collocation as a
source of linguistic knowledge. Collocation describes the statistical
tendency of certain words to co-occur in a language, within a defined
range. It is suggested that this tendency may be exploited to aid
automatic text recognition.
The construction and use of a post-processing system
incorporating collocational knowledge is described, as are a number
of experiments designed to test the effectiveness of collocation as an
aid to text recognition. The results of these experiments suggest that
collocational statistics may be a useful form of knowledge for this
application and that further research may produce a system of real
practical use
Handwritten character recognition using a gradient based feature extraction
Handwriting Recognition is the task of transforming a language that is represented in its spatial form of graphical marks into its symbolic representation. In Offline Handwriting Recognition, there are three steps: preprocessing of the image, segmentation of words into characters and recognition of the characters. In this thesis I implemented two methods for character recognition, which is the most important step in Offline Handwriting Recognition. The heart of character recognition is the features that are extracted from the character image. The accuracy of the classification of the character image depends on the quality of the features extracted from the image. The two methods presented in this thesis use two different types of features. One uses the connectivity features among various segments in a character image, and the other method uses the gradient feature at each pixel to construct the feature vectors. Both these methods are discussed in detail in the following chapters
DH Benelux Journal 4. The Humanities in a Digital World
The fourth volume of the DH Benelux Journal. This volume includes seven full-length, peer-reviewed articles that are based on accepted contributions to the 2021 virtual DH Benelux conference. Contents: 1. Editors' Preface (Wout Dillen, Margherita Fantoli, Marijn Koolen, Marieke van Erp); 2. Introduction: The Humanities in a Digital World (Lorella Viola, Jelena Prokic, Antske Fokkens, Tommaso Caselli); 3. A Game of Persistence, Self-doubt, and Curiosity: Surveying Code Literacy in Digital Humanities
(Elli Bleeker, Marijn Koolen, Kaspar Beelen, Liliana Melgar, Joris van Zundert, Sally Chambers); 4. Introducing the DHARPA Project: An Interdisciplinary Lab to Enable
Critical DH Practice (Angela R. Cunningham, Helena Jaskov, Sean Takats, Lorella Viola); 5. Examining a Multi Layered Approach for Classification of OCR Quality without Ground Truth (Mirjam Cuper); 6. Modeling Ontologies for Individual Artists: A Case Study of a Dutch Ceramic Glass Sculptor (Victor de Boer, Daan Raven, Erik Esmeijer, Johan Oome); 7. Judging a Book by its Criticism: A Digital Analysis of the Professional and Community Driven Literary Criticism of the Ingeborg-Bachmann-Preis (Lore De Greve, Gunther Martens); 8. When No News is Bad News. News-Based Change Detection during COVID-19 (Kristoffer L. Nielbo, Frida Hæstrup, Kenneth C. Enevoldsen, Peter B. Vahlstrup, Rebekah B. Baglini, Andreas Roepstorff); 9. Combining Tools with Linked Data: a Social History Example (Ivo Zandhuis)
- …