1,806 research outputs found
Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition
Online handwritten Chinese text recognition (OHCTR) is a challenging problem
as it involves a large-scale character set, ambiguous segmentation, and
variable-length input sequences. In this paper, we exploit the outstanding
capability of path signature to translate online pen-tip trajectories into
informative signature feature maps using a sliding window-based method,
successfully capturing the analytic and geometric properties of pen strokes
with strong local invariance and robustness. A multi-spatial-context fully
convolutional recurrent network (MCFCRN) is proposed to exploit the multiple
spatial contexts from the signature feature maps and generate a prediction
sequence while completely avoiding the difficult segmentation problem.
Furthermore, an implicit language model is developed to make predictions based
on semantic context within a predicting feature sequence, providing a new
perspective for incorporating lexicon constraints and prior knowledge about a
certain language in the recognition procedure. Experiments on two standard
benchmarks, Dataset-CASIA and Dataset-ICDAR, yielded outstanding results, with
correct rates of 97.10% and 97.15%, respectively, which are significantly
better than the best result reported thus far in the literature.Comment: 14 pages, 9 figure
Off-line Arabic Handwriting Recognition System Using Fast Wavelet Transform
In this research, off-line handwriting recognition system for Arabic alphabet is
introduced. The system contains three main stages: preprocessing, segmentation and
recognition stage. In the preprocessing stage, Radon transform was used in the design
of algorithms for page, line and word skew correction as well as for word slant
correction. In the segmentation stage, Hough transform approach was used for line
extraction. For line to words and word to characters segmentation, a statistical method
using mathematic representation of the lines and words binary image was used.
Unlike most of current handwriting recognition system, our system simulates the
human mechanism for image recognition, where images are encoded and saved in
memory as groups according to their similarity to each other. Characters are
decomposed into a coefficient vectors, using fast wavelet transform, then, vectors,
that represent a character in different possible shapes, are saved as groups with one
representative for each group. The recognition is achieved by comparing a vector of
the character to be recognized with group representatives.
Experiments showed that the proposed system is able to achieve the recognition task
with 90.26% of accuracy. The system needs only 3.41 seconds a most to recognize a
single character in a text of 15 lines where each line has 10 words on average
Named Entity Recognition in multilingual handwritten texts
[ES] En nuestro trabajo presentamos un único modelo basado en aprendizaje profundo para la transcripción automática y el reconocimiento de entidades nombradas de textos manuscritos. Este modelo aprovecha las capacidades de generalización de sistemas de reconocimiento, combinando redes neuronales artificiales y n-gramas de caracteres. Se discute la evaluación de dicho sistema y, como consecuencia, se propone una nueva medida de evaluación. Con el fin de mejorar los resultados con respecto a dicha métrica, se evalúan diferentes estrategias de corrección de errores.[EN] In our work we present a single Deep Learning based model for the automatic transcription and Named Entity Recognition of handwritten texts. Such model leverages the generalization capabilities of recognition systems, combining Artificial Neural Networks and n-gram character models. The evaluation of said system is discussed and, as a consequence, a new evaluation metric is proposed. As a means to improve the results in regards to such metric, different error correction strategies are assessed.Villanova Aparisi, D. (2021). Named Entity Recognition in multilingual handwritten texts. Universitat Politècnica de València. http://hdl.handle.net/10251/174942TFG
HMM word graph based keyword spotting in handwritten document images
[EN] Line-level keyword spotting (KWS) is presented on the basis of frame-level word posterior
probabilities. These posteriors are obtained using word graphs derived from the recogni-
tion process of a full-fledged handwritten text recognizer based on hidden Markov models
and N-gram language models. This approach has several advantages. First, since it uses
a holistic, segmentation-free technology, it does not require any kind of word or charac-
ter segmentation. Second, the use of language models allows the context of each spotted
word to be taken into account, thereby considerably increasing KWS accuracy. And third,
the proposed KWS scores are based on true posterior probabilities, taking into account
all (or most) possible word segmentations of the input image. These scores are properly
bounded and normalized. This mathematically clean formulation lends itself to smooth,
threshold-based keyword queries which, in turn, permit comfortable trade-offs between
search precision and recall. Experiments are carried out on several historic collections of
handwritten text images, as well as a well-known data set of modern English handwrit-
ten text. According to the empirical results, the proposed approach achieves KWS results
comparable to those obtained with the recently-introduced "BLSTM neural networks KWS"
approach and clearly outperform the popular, state-of-the-art "Filler HMM" KWS method.
Overall, the results clearly support all the above-claimed advantages of the proposed ap-
proach.This work has been partially supported by the Generalitat Valenciana under the Prometeo/2009/014 project grant ALMA-MATER, and through the EU projects: HIMANIS (JPICH programme, Spanish grant Ref. PCIN-2015-068) and READ (Horizon 2020 programme, grant Ref. 674943).Toselli, AH.; Vidal, E.; Romero, V.; Frinken, V. (2016). HMM word graph based keyword spotting in handwritten document images. Information Sciences. 370:497-518. https://doi.org/10.1016/j.ins.2016.07.063S49751837
Handwritten Music Recognition for Mensural notation with convolutional recurrent neural networks
[EN] Optical Music Recognition is the technology that allows computers to read music notation, which is also referred to as Handwritten Music Recognition when it is applied over handwritten notation. This technology aims at efficiently transcribing written music into a representation that can be further processed by a computer. This is of special interest to transcribe the large amount of music written in early notations, such as the Mensural notation, since they represent largely unexplored heritage for the musicological community. Traditional approaches to this problem are based on complex strategies with many explicit rules that only work for one particular type of manuscript. Machine learning approaches offer the promise of generalizable solutions, based on learning from just labelled examples. However, previous research has not achieved sufficiently acceptable results for handwritten Mensural notation. In this work we propose the use of deep neural networks, namely convolutional recurrent neural networks, which have proved effective in other similar domains such as handwritten text recognition. Our experimental results achieve, for the first time, recognition results that can be considered effective for transcribing handwritten Mensural notation, decreasing the symbol-level error rate of previous approaches from 25.7% to 7.0%. (C) 2019 Elsevier B.V. All rights reserved.First author thanks the support from the Spanish Ministry "HISPAMUS" project (TIN2017-86576-R), partially funded by the EU. The other authors were supported by the European Union's H2020 grant "Recognition and Enrichment of Archival Documents" (Ref. 674943), by the BBVA Foundacion through the 2017-2018 and 2018-2019 Digital Humanities research grants "Carabela" and "HistWeather - Dos Siglos de Datos Cilmaticos", and by EU JPICH project "HOME - History Of Medieval Europe"(Spanish PEICTI Ref. PCI2018-093122).Calvo-Zaragoza, J.; Toselli, AH.; Vidal, E. (2019). Handwritten Music Recognition for Mensural notation with convolutional recurrent neural networks. Pattern Recognition Letters. 128:115-121. https://doi.org/10.1016/j.patrec.2019.08.021S11512112
Multimodal Interactive Transcription of Handwritten Text Images
En esta tesis se presenta un nuevo marco interactivo y multimodal para la transcripción de
Documentos manuscritos. Esta aproximación, lejos de proporcionar la transcripción completa
pretende asistir al experto en la dura tarea de transcribir.
Hasta la fecha, los sistemas de reconocimiento de texto manuscrito disponibles no proporcionan
transcripciones aceptables por los usuarios y, generalmente, se requiere la intervención
del humano para corregir las transcripciones obtenidas. Estos sistemas han demostrado ser
realmente útiles en aplicaciones restringidas y con vocabularios limitados (como es el caso
del reconocimiento de direcciones postales o de cantidades numéricas en cheques bancarios),
consiguiendo en este tipo de tareas resultados aceptables. Sin embargo, cuando se trabaja
con documentos manuscritos sin ningún tipo de restricción (como documentos manuscritos
antiguos o texto espontáneo), la tecnología actual solo consigue resultados inaceptables.
El escenario interactivo estudiado en esta tesis permite una solución más efectiva. En este
escenario, el sistema de reconocimiento y el usuario cooperan para generar la transcripción final
de la imagen de texto. El sistema utiliza la imagen de texto y una parte de la transcripción
previamente validada (prefijo) para proponer una posible continuación. Despues, el usuario
encuentra y corrige el siguente error producido por el sistema, generando así un nuevo prefijo
mas largo. Este nuevo prefijo, es utilizado por el sistema para sugerir una nueva hipótesis. La
tecnología utilizada se basa en modelos ocultos de Markov y n-gramas. Estos modelos son
utilizados aquí de la misma manera que en el reconocimiento automático del habla. Algunas
modificaciones en la definición convencional de los n-gramas han sido necesarias para tener
en cuenta la retroalimentación del usuario en este sistema.Romero Gómez, V. (2010). Multimodal Interactive Transcription of Handwritten Text Images [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8541Palanci
Proceedings of the 4th International Workshop on Reading Music Systems
The International Workshop on Reading Music Systems (WoRMS) is a workshop
that tries to connect researchers who develop systems for reading music, such
as in the field of Optical Music Recognition, with other researchers and
practitioners that could benefit from such systems, like librarians or
musicologists.
The relevant topics of interest for the workshop include, but are not limited
to: Music reading systems; Optical music recognition; Datasets and performance
evaluation; Image processing on music scores; Writer identification; Authoring,
editing, storing and presentation systems for music scores; Multi-modal
systems; Novel input-methods for music to produce written music; Web-based
Music Information Retrieval services; Applications and projects; Use-cases
related to written music.
These are the proceedings of the 4th International Workshop on Reading Music
Systems, held online on Nov. 18th 2022.Comment: Proceedings edited by Jorge Calvo-Zaragoza, Alexander Pacha and Elona
Shatr
- …