Search CORE

461 research outputs found

Text Recognition in Multimedia Documents: A Study of two Neural-based OCRs Using and Avoiding Character Segmentation

Author: A Dempster
C Garcia
Christophe Garcia
D Chen
Franck Mamalet
H Li
J Lim
J Weinman
K Jung
Khaoula Elagouni
L Bahl
M Li
Pascale Sébillot
Q Ye
R Casey
R Yager
S Lucas
T Sato
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2014
Field of study

International audienceText embedded in multimedia documents represents an important semantic information that helps to automatically access the content. This paper proposes two neural-based OCRs that handle the text recognition problem in different ways. The first approach segments a text image into individual characters before recognizing them, while the second one avoids the segmentation step by integrating a multi-scale scanning scheme that allows to jointly localize and recognize characters at each position and scale. Some linguistic knowledge is also incorporated into the proposed schemes to remove errors due to recognition confusions. Both OCR systems are applied to caption texts embedded in videos and in natural scene images and provide outstanding results showing that the proposed approaches outperform the state-of-the-art methods

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

HAL-Rennes 1

Artificial neural network and its applications in quality process control, document recognition and biomedical imaging

Author: Islam Mohammed Jahirul
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2010
Field of study

In computer-vision based system a digital image obtained by a digital camera would usually have 24-bit color image. The analysis of an image with that many levels might require complicated image processing techniques and higher computational costs. But in real-time application, where a part has to be inspected within a few milliseconds, either we have to reduce the image to a more manageable number of gray levels, usually two levels (binary image), and at the same time retain all necessary features of the original image or develop a complicated technique. A binary image can be obtained by thresholding the original image into two levels. Therefore, thresholding of a given image into binary image is a necessary step for most image analysis and recognition techniques. In this thesis, we have studied the effectiveness of using artificial neural network (ANN) in pharmaceutical, document recognition and biomedical imaging applications for image thresholding and classification purposes. Finally, we have developed edge-based, ANN-based and region-growing based image thresholding techniques to extract low contrast objects of interest and classify them into respective classes in those applications. Real-time quality inspection of gelatin capsules in pharmaceutical applications is an important issue from the point of view of industry\u27s productivity and competitiveness. Computer vision-based automatic quality inspection and controller system is one of the solutions to this problem. Machine vision systems provide quality control and real-time feedback for industrial processes, overcoming physical limitations and subjective judgment of humans. In this thesis, we have developed an image processing system using edge-based image thresholding techniques for quality inspection that satisfy the industrial requirements in pharmaceutical applications to pass the accepted and rejected capsules. In document recognition application, success of OCR mostly depends on the quality of the thresholded image. Non-uniform illumination, low contrast and complex background make it challenging in this application. In this thesis, optimal parameters for ANN-based local thresholding approach for gray scale composite document image with non-uniform background is proposed. An exhaustive search was conducted to select the optimal features and found that pixel value, mean and entropy are the most significant features at window size 3x3 in this application. For other applications, it might be different, but the procedure to find the optimal parameters is same. The average recognition rate 99.25% shows that the proposed 3 features at window size 3x3 are optimal in terms of recognition rate and PSNR compare to the ANN-based thresholding technique with different parameters presented in the literature. In biomedical imaging application, breast cancer continues to be a public health problem. In this thesis we presented a computer aided diagnosis (CAD) system for mass detection and classification in digitized mammograms, which performs mass detection on regions of interest (ROI) followed by the benign-malignant classification on detected masses. Three layers ANN with seven features is proposed for classifying the marked regions into benign and malignant and 90.91% sensitivity and 83.87% specificity is achieved that is very much promising compare to the radiologist\u27s sensitivity 75%

Scholarship at UWindsor

Computer analysis of composite documents with non-uniform background.

Author: Alginahi Yasser
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2004
Field of study

The motivation behind most of the applications of off-line text recognition is to convert data from conventional media into electronic media. Such applications are bank cheques, security documents and form processing. In this dissertation a document analysis system is presented to transfer gray level composite documents with complex backgrounds and poor illumination into electronic format that is suitable for efficient storage, retrieval and interpretation. The preprocessing stage for the document analysis system requires the conversion of a paper-based document to a digital bit-map representation after optical scanning followed by techniques of thresholding, skew detection, page segmentation and Optical Character Recognition (OCR). The system as a whole operates in a pipeline fashion where each stage or process passes its output to the next stage. The success of each stage guarantees that the operation of the system as a whole with no failures that may reduce the character recognition rate. By designing this document analysis system a new local bi-level threshold selection technique was developed for gray level composite document images with non-uniform background. The algorithm uses statistical and textural feature measures to obtain a feature vector for each pixel from a window of size (2 n + 1) x (2n + 1), where n ≥ 1. These features provide a local understanding of pixels from their neighbourhoods making it easier to classify each pixel into its proper class. A Multi-Layer Perceptron Neural Network is then used to classify each pixel value in the image. The results of thresholding are then passed to the block segmentation stage. The block segmentation technique developed is a feature-based method that uses a Neural Network classifier to automatically segment and classify the image contents into text and halftone images. Finally, the text blocks are passed into a Character Recognition (CR) system to transfer characters into an editable text format and the recognition results were compared to those obtained from a commercial OCR. The OCR system implemented uses pixel distribution as features extracted from different zones of the characters. A correlation classifier is used to recognize the characters. For the application of cheque processing, this system was used to read the special numerals of the optical barcode found in bank cheques. The OCR system uses a fuzzy descriptive feature extraction method with a correlation classifier to recognize these special numerals, which identify the bank institute and provides personal information about the account holder. The new local thresholding scheme was tested on a variety of composite document images with complex backgrounds. The results were very good compared to the results from commercial OCR software. This proposed thresholding technique is not limited to a specific application. It can be used on a variety of document images with complex backgrounds and can be implemented in any document analysis system provided that sufficient training is performed.Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .A445. Source: Dissertation Abstracts International, Volume: 66-02, Section: B, page: 1061. Advisers: Maher Sid-Ahmed; Majid Ahmadi. Thesis (Ph.D.)--University of Windsor (Canada), 2004

Scholarship at UWindsor

A fuzzy approach to segment touching characters

Author: AIRO' FARULLA Giuseppe
Murru Nadir
Rossini Rosaria
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Institutional Research Information System University of Turin

Symbolic and Visual Retrieval of Mathematical Notation using Formula Graph Symbol Pair Matching and Structural Alignment

Author: Davila Castellanos Kenny
Publication venue: RIT Scholar Works
Publication date: 01/07/2017
Field of study

Large data collections containing millions of math formulae in different formats are available on-line. Retrieving math expressions from these collections is challenging. We propose a framework for retrieval of mathematical notation using symbol pairs extracted from visual and semantic representations of mathematical expressions on the symbolic domain for retrieval of text documents. We further adapt our model for retrieval of mathematical notation on images and lecture videos. Graph-based representations are used on each modality to describe math formulas. For symbolic formula retrieval, where the structure is known, we use symbol layout trees and operator trees. For image-based formula retrieval, since the structure is unknown we use a more general Line of Sight graph representation. Paths of these graphs define symbol pairs tuples that are used as the entries for our inverted index of mathematical notation. Our retrieval framework uses a three-stage approach with a fast selection of candidates as the first layer, a more detailed matching algorithm with similarity metric computation in the second stage, and finally when relevance assessments are available, we use an optional third layer with linear regression for estimation of relevance using multiple similarity scores for final re-ranking. Our model has been evaluated using large collections of documents, and preliminary results are presented for videos and cross-modal search. The proposed framework can be adapted for other domains like chemistry or technical diagrams where two visually similar elements from a collection are usually related to each other

RIT Scholar Works

Advanced Image Acquisition, Processing Techniques and Applications

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

"Advanced Image Acquisition, Processing Techniques and Applications" is the first book of a series that provides image processing principles and practical software implementation on a broad range of applications. The book integrates material from leading researchers on Applied Digital Image Acquisition and Processing. An important feature of the book is its emphasis on software tools and scientific computing in order to enhance results and arrive at problem solution

Directory of Open Access Books (DOAB)