Search CORE

2,320 research outputs found

Composition of Constraint, Hypothesis and Error Models to improve interaction in Human-Machine Interfaces

Author: Allauzen
Allauzen
Amengual
B. T. Al Azawi
Bastide
Berghel
Breuel
Brown
Eisner
Farooq
Garcia
Grande
Hall
Hassan
J. Ramon Navarro-Cerdan
Joaquim Arlandis
Juan-Carlos Perez-Cortes
Khaleghi
Llobet
Llobet
Meyer
Mohri
Mohri
Müller
Nelder
Neuhoff
Park
Perez-Cortes
Pérez-Cortes
Rafael Llobet
Raman
Riley
Vidal
Vidal
Publication venue: 'Elsevier BV'
Publication date: 01/05/2016
Field of study

We use Weighted Finite-State Transducers (WFSTs) to represent the different sources of information available: the initial hypotheses, the possible errors, the constraints imposed by the task (interaction language) and the user input. The fusion of these models to find the most probable output string can be performed efficiently by using carefully selected transducer operations. The proposed system initially suggests an output based on the set of hypotheses, possible errors and Constraint Models. Then, if human intervention is needed, a multimodal approach, where the user input is combined with the aforementioned models, is applied to produce, with a minimum user effort, the desired output. This approach offers the practical advantages of a de-coupled model (e.g. input-system + parameterized rules + post-processor), keeping at the same time the error-recovery power of an integrated approach, where all the steps of the process are performed in the same formal machine (as in a typical HMM in speech recognition) to avoid that an error at a given step remains unrecoverable in the subsequent steps. After a presentation of the theoretical basis of the proposed multi-source information system, its application to two real world problems, as an example of the possibilities of this architecture, is addressed. The experimental results obtained demonstrate that a significant user effort can be saved when using the proposed procedure. A simple demonstration, to better understand and evaluate the proposed system, is available on the web https://demos.iti.upv.es/hi/. (C) 2015 Elsevier B.V. All rights reserved.Navarro Cerdan, JR.; Llobet Azpitarte, R.; Arlandis, J.; Perez-Cortes, J. (2016). Composition of Constraint, Hypothesis and Error Models to improve interaction in Human-Machine Interfaces. Information Fusion. 29:1-13. doi:10.1016/j.inffus.2015.09.001S1132

Crossref

RiuNet

Visual Re-ranking with Natural Language Understanding for Text Spotting

Author: Moreno-Noguer Francesc
Padró Lluís
Sabir Ahmed
Publication venue
Publication date: 01/01/2018
Field of study

Many scene text recognition approaches are based on purely visual information and ignore the semantic relation between scene and text. In this paper, we tackle this problem from natural language processing perspective to fill the gap between language and vision. We propose a post-processing approach to improve scene text recognition accuracy by using occurrence probabilities of words (unigram language model), and the semantic correlation between scene and text. For this, we initially rely on an off-the-shelf deep neural network, already trained with a large amount of data, which provides a series of text hypotheses per input image. These hypotheses are then re-ranked using word frequencies and semantic relatedness with objects or scenes in the image. As a result of this combination, the performance of the original network is boosted with almost no additional cost. We validate our approach on ICDAR'17 dataset.Comment: Accepted by ACCV 2018. arXiv admin note: substantial text overlap with arXiv:1810.0977

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

Automated Error Detection in Digitized Cultural Heritage Documents

Author: Gábor Kata
Sagot Benoît
Publication venue: HAL CCSD
Publication date: 26/04/2014
Field of study

International audienceThe work reported in this paper aims at performance optimization in the digitization of documents pertaining to the cultural heritage domain. A hybrid method is roposed, combining statistical classification algorithms and linguistic knowledge to automatize post-OCR error detection and correction. The current paper deals with the integration of linguistic modules and their impact on error detection

INRIA a CCSD electronic archive server

Hal-Diderot

Visual re-ranking with natural language understanding for text spotting

Author: Moreno-Noguer Francesc
Padró Lluís
Sabir Ahmed
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

The final publication is available at link.springer.comMany scene text recognition approaches are based on purely visual information and ignore the semantic relation between scene and text. In this paper, we tackle this problem from natural language processing perspective to fill the gap between language and vision. We propose a post processing approach to improve scene text recognition accuracy by using occurrence probabilities of words (unigram language model), and the semantic correlation between scene and text. For this, we initially rely on an off-the-shelf deep neural network, already trained with large amount of data, which provides a series of text hypotheses per input image. These hypotheses are then re-ranked using word frequencies and semantic relatedness with objects or scenes in the image. As a result of this combination, the performance of the original network is boosted with almost no additional cost. We validate our approach on ICDAR'17 dataset.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Image speech combination for interactive computer assisted transcription of handwritten documents

Author: Granell Emilio
Martínez-Hinarejos Carlos-D.
Romero Verónica
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

[EN] Handwritten document transcription aims to obtain the contents of a document to provide efficient information access to, among other, digitised historical documents. The increasing number of historical documents published by libraries and archives makes this an important task. In this context, the use of image processing and understanding techniques in conjunction with assistive technologies reduces the time and human effort required for obtaining the final perfect transcription. The assistive transcription system proposes a hypothesis, usually derived from a recognition process of the handwritten text image. Then, the professional transcriber feedback can be used to obtain an improved hypothesis and speed-up the final transcription. In this framework, a speech signal corresponding to the dictation of the handwritten text can be used as an additional source of information. This multimodal approach, that combines the image of the handwritten text with the speech of the dictation of its contents, could make better the hypotheses (initial and improved) offered to the transcriber. In this paper we study the feasibility of a multimodal interactive transcription system for an assistive paradigm known as Computer Assisted Transcription of Text Images. Different techniques are tested for obtaining the multimodal combination in this framework. The use of the proposed multimodal approach reveals a significant reduction of transcription effort with some multimodal combination techniques, allowing for a faster transcription process.Work partially supported by projects READ-674943 (European Union's H2020), SmartWays-RTC-2014-1466-4 (MINECO, Spain), and CoMUN-HaT-TIN2015-70924-C2-1-R (MINECO/FEDER), and by Generalitat Valenciana (GVA), Spain under reference PROMETEOII/2014/030.Granell, E.; Romero, V.; Martínez-Hinarejos, C. (2019). Image speech combination for interactive computer assisted transcription of handwritten documents. Computer Vision and Image Understanding. 180:74-83. https://doi.org/10.1016/j.cviu.2019.01.009S748318

RiuNet

Optical character recognition with neural networks and post-correction with finite state methods

Author: Drobac Senka
Linden Krister
Publication venue
Publication date: 01/12/2020
Field of study

The optical character recognition (OCR) quality of the historical part of the Finnish newspaper and journal corpus is rather low for reliable search and scientific research on the OCRed data. The estimated character error rate (CER) of the corpus, achieved with commercial software, is between 8 and 13%. There have been earlier attempts to train high-quality OCR models with open-source software, like Ocropy (https://github.com/tmbdev/ocropy) and Tesseract (https://github.com/tesseract-ocr/tesseract), but so far, none of the methods have managed to successfully train a mixed model that recognizes all of the data in the corpus, which would be essential for an efficient re-OCRing of the corpus. The difficulty lies in the fact that the corpus is printed in the two main languages of Finland (Finnish and Swedish) and in two font families (Blackletter and Antiqua). In this paper, we explore the training of a variety of OCR models with deep neural networks (DNN). First, we find an optimal DNN for our data and, with additional training data, successfully train high-quality mixed-language models. Furthermore, we revisit the effect of confidence voting on the OCR results with different model combinations. Finally, we perform post-correction on the new OCR results and perform error analysis. The results show a significant boost in accuracy, resulting in 1.7% CER on the Finnish and 2.7% CER on the Swedish test set. The greatest accomplishment of the study is the successful training of one mixed language model for the entire corpus and finding a voting setup that further improves the results.Peer reviewe

Helsingin yliopiston digitaalinen arkisto