Search CORE

806 research outputs found

Enhancing Energy Minimization Framework for Scene Text Recognition with Top-Down Cues

Author: Alahari Karteek
Jawahar C. V.
Mishra Anand
Publication venue: 'Elsevier BV'
Publication date: 12/01/2016
Field of study

Recognizing scene text is a challenging problem, even more so than the recognition of scanned documents. This problem has gained significant attention from the computer vision community in recent years, and several methods based on energy minimization frameworks and deep learning approaches have been proposed. In this work, we focus on the energy minimization framework and propose a model that exploits both bottom-up and top-down cues for recognizing cropped words extracted from street images. The bottom-up cues are derived from individual character detections from an image. We build a conditional random field model on these detections to jointly model the strength of the detections and the interactions between them. These interactions are top-down cues obtained from a lexicon-based prior, i.e., language statistics. The optimal word represented by the text image is obtained by minimizing the energy function corresponding to the random field model. We evaluate our proposed algorithm extensively on a number of cropped scene text benchmark datasets, namely Street View Text, ICDAR 2003, 2011 and 2013 datasets, and IIIT 5K-word, and show better performance than comparable methods. We perform a rigorous analysis of all the steps in our approach and analyze the results. We also show that state-of-the-art convolutional neural network features can be integrated in our framework to further improve the recognition performance

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Three-stage ensemble of image net pre-trained networks for pneumonia detection

Author: Chawla Shailey
Feng Yang
Uddamvathanak Rom
Publication venue: 'Faculty of Medicine Siriraj Hospital, Mahidol University'
Publication date: 01/01/2019
Field of study

Focusing on detection of pneumenia disease in the Chest X-Ray images, this paper proposes a three-stage ensemble methodology utilizing multiple pre-trained Convolutional Neural Networks (CNNs). In the first-stage ensemble, k subsets of training data are firstly randomly generated, each of which is then used to retrain a pre-trained CNN to produce k CNN models for the ensemble in the first stage. In the second-stage ensemble, multiple ensemble CNN models based on multiple pre-trained CNNs are integrated to reduce variance and improve the performance of the prediction. The third-stage ensemble is based on image augmentation, i.e., the original set of images are augmented to generate a few sets of additional images, after which each set of images are input to the ensemble models from the first two stages, and the outputs based multiple sets of images are then integrated. In integrating outputs in each stage, four ensemble techniques are introduced including averaging, feed forward neural network-based, decision tree-based, and majority voting. Thorough experiments were conducted on Chest X-Ray images from a Kaggle challenge, and the results showed the effectiveness of the proposed three-stage ensemble method in detecting pneumonia disease in the images

ResearchOnline at James Cook University

Survey of review spam detection using machine learning techniques

Author
Publication venue: Springer
Publication date: 05/10/2015
Field of study

Springer - Publisher Connector

A limited-size ensemble of homogeneous CNN/LSTMs for high-performance word classification

Author: A Graves
A Graves
A Graves
A Vinciarelli
B Oommen
B Peleg
B Shi
B Stuner
C Wells
C-L Liu
CE Shannon
D Asonov
DM Ford
DR Hardoon
E Okafor
G Seni
G Seni
GK Zipf
J Almazán
J Sueiras
JP Van Oosten
JT Favata
M Côté
M Stehlìk
MA Youssef Bassil
NQ Emlen
PE Bramall
R Ptucha
RA Wagner
RC Angell
RJ Plamondon
RT Schuh
S Günter
S Günter
S He
S Hochreiter
SC Chantal Amrhein
T Van der Zant
T Van der Zant
TK Ho
U-V Marti
VI Levenshtein
VP Romesh Ranawana
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2021
Field of study

The strength of long short-term memory neural networks (LSTMs) that have been applied is more located in handling sequences of variable length than in handling geometric variability of the image patterns. In this paper, an end-to-end convolutional LSTM neural network is used to handle both geometric variation and sequence variability. The best results for LSTMs are often based on large-scale training of an ensemble of network instances. We show that high performances can be reached on a common benchmark set by using proper data augmentation for just five such networks using a proper coding scheme and a proper voting scheme. The networks have similar architectures (convolutional neural network (CNN): five layers, bidirectional LSTM (BiLSTM): three layers followed by a connectionist temporal classification (CTC) processing step). The approach assumes differently scaled input images and different feature map sizes. Three datasets are used: the standard benchmark RIMES dataset (French); a historical handwritten dataset KdK (Dutch); the standard benchmark George Washington (GW) dataset (English). Final performance obtained for the word-recognition test of RIMES was 96.6%, a clear improvement over other state-of-the-art approaches which did not use a pre-trained network. On the KdK and GW datasets, our approach also shows good results. The proposed approach is deployed in the Monk search engine for historical-handwriting collections

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

WordFences: Text localization and recognition

Author: Polzounov Andrei
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2017
Field of study

En col·laboració amb la Universitat de Barcelona (UB) i la Universitat Rovira i Virgili (URV)In recent years, text recognition has achieved remarkable success in recognizing scanned document text. However, word recognition in natural images is still an open problem, which generally requires time consuming post-processing steps. We present a novel architecture for individual word detection in scene images based on semantic segmentation. Our contributions are twofold: the concept of WordFence, which detects border areas surrounding each individual word and a unique pixelwise weighted softmax loss function which penalizes background and emphasizes small text regions. WordFence ensures that each word is detected individually, and the new loss function provides a strong training signal to both text and word border localization. The proposed technique avoids intensive post-processing by combining semantic word segmentation with a voting scheme for merging segmentations of multiple scales, producing an end-to-end word detection system. We achieve superior localization recall on common benchmark datasets - 92% recall on ICDAR11 and ICDAR13 and 63% recall on SVT. Furthermore, end-to-end word recognition achieves state-of-the-art 86% F-Score on ICDAR13

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC