Search CORE

7 research outputs found

Text Segmentation and Recognition in Complex Background Based on Markov Random Field

Author: Bourlard Hervé
Chen Datong
Odobez Jean-Marc
Publication venue: Quebec city, Canada
Publication date: 10/03/2006
Field of study

In this paper we propose a method to segment and recognize text embedded in video and images. We modelize the gray level distribution in the text images as mixture of gaussians, and then assign each pixel to one of the gaussian layer. The assignment is based on prior of the contextual information, which is modeled by a Markov random field (MRF) with online estimated coefficients. Each layer is then processed through a connected component analysis module and forwarded to the OCR system as one segmentation hypothesis. By varying the number of gaussians, multiple hypotheses are provided to an OCR system and the final result is selected from the set of outputs, leading to an improvement of the system's performances

Infoscience - École polytechnique fédérale de Lausanne

Enhancing Energy Minimization Framework for Scene Text Recognition with Top-Down Cues

Author: Alahari Karteek
Jawahar C. V.
Mishra Anand
Publication venue: 'Elsevier BV'
Publication date: 12/01/2016
Field of study

Recognizing scene text is a challenging problem, even more so than the recognition of scanned documents. This problem has gained significant attention from the computer vision community in recent years, and several methods based on energy minimization frameworks and deep learning approaches have been proposed. In this work, we focus on the energy minimization framework and propose a model that exploits both bottom-up and top-down cues for recognizing cropped words extracted from street images. The bottom-up cues are derived from individual character detections from an image. We build a conditional random field model on these detections to jointly model the strength of the detections and the interactions between them. These interactions are top-down cues obtained from a lexicon-based prior, i.e., language statistics. The optimal word represented by the text image is obtained by minimizing the energy function corresponding to the random field model. We evaluate our proposed algorithm extensively on a number of cropped scene text benchmark datasets, namely Street View Text, ICDAR 2003, 2011 and 2013 datasets, and IIIT 5K-word, and show better performance than comparable methods. We perform a rigorous analysis of all the steps in our approach and analyze the results. We also show that state-of-the-art convolutional neural network features can be integrated in our framework to further improve the recognition performance

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Monte Carlo Video Text Segmentation

Author: Chen Datong
Odobez Jean-Marc
Thiran Jean-Philippe
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 10/03/2006
Field of study

This paper presents a probabilistic algorithm for segmenting and recognizing text embedded in video sequences based on adaptive thresholding using a Bayes filtering method. The algorithm approximates the posterior distribution of segmentation thresholds of video text by a set of weighted samples. The set of samples is initialized by applying a classical segmentation algorithm on the first video frame and further refined by random sampling under a temporal Bayesian framework. This framework allows us to evaluate an text image segmentor on the basis of recognition result instead of visual segmentation result, which is directly relevant to our character recognition task. Results on a database of 6944 images demonstrate the validity of the algorithm

Infoscience - École polytechnique fédérale de Lausanne

Video Text Segmentation Using Particle Filters

Author: Chen Datong
Odobez Jean-Marc
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Text detection and recognition in images and video sequences

Author: Chen Datong
Publication venue
Publication date: 10/03/2006
Field of study

Text characters embedded in images and video sequences represents a rich source of information for content-based indexing and retrieval applications. However, these text characters are difficult to be detected and recognized due to their various sizes, grayscale values and complex backgrounds. This thesis investigates methods for building an efficient application system for detecting and recognizing text of any grayscale values embedded in images and video sequences. Both empirical image processing methods and statistical machine learning and modeling approaches are studied in two sub-problems: text detection and text recognition. Applying machine learning methods for text detection encounters difficulties due to character size, grayscale variations and heavy computation cost. To overcome these problems, we propose a two-step localization/verification approach. The first step aims at quickly localizing candidate text lines, enabling the normalization of characters into a unique size. In the verification step, a trained support vector machine or multi-layer perceptrons is applied on background independent features to remove the false alarms. Text recognition, even from the detected text lines, remains a challenging problem due to the variety of fonts, colors, the presence of complex backgrounds and the short length of the text strings. Two schemes are investigated addressing the text recognition problem: bi-modal enhancement scheme and multi-modal segmentation scheme. In the bi-modal scheme, we propose a set of filters to enhance the contrast of black and white characters and produce a better binarization before recognition. For more general cases, the text recognition is addressed by a text segmentation step followed by a traditional optical character recognition (OCR) algorithm within a multi-hypotheses framework. In the segmentation step, we model the distribution of grayscale values of pixels using a Gaussian mixture model or a Markov Random Field. The resulting multiple segmentation hypotheses are post-processed by a connected component analysis and a grayscale consistency constraint algorithm. Finally, they are processed by an OCR software. A selection algorithm based on language modeling and OCR statistics chooses the text result from all the produced text strings. Additionally, methods for using temporal information of video text are investigated. A Monte Carlo video text segmentation method is proposed for adapting the segmentation parameters along temporal text frames. Furthermore, a ROVER (Recognizer Output Voting Error Reduction) algorithm is studied for improving the final recognition text string by voting the characters through temporal frames

Infoscience - École polytechnique fédérale de Lausanne