45 research outputs found

    A limited-size ensemble of homogeneous CNN/LSTMs for high-performance word classification

    Get PDF
    The strength of long short-term memory neural networks (LSTMs) that have been applied is more located in handling sequences of variable length than in handling geometric variability of the image patterns. In this paper, an end-to-end convolutional LSTM neural network is used to handle both geometric variation and sequence variability. The best results for LSTMs are often based on large-scale training of an ensemble of network instances. We show that high performances can be reached on a common benchmark set by using proper data augmentation for just five such networks using a proper coding scheme and a proper voting scheme. The networks have similar architectures (convolutional neural network (CNN): five layers, bidirectional LSTM (BiLSTM): three layers followed by a connectionist temporal classification (CTC) processing step). The approach assumes differently scaled input images and different feature map sizes. Three datasets are used: the standard benchmark RIMES dataset (French); a historical handwritten dataset KdK (Dutch); the standard benchmark George Washington (GW) dataset (English). Final performance obtained for the word-recognition test of RIMES was 96.6%, a clear improvement over other state-of-the-art approaches which did not use a pre-trained network. On the KdK and GW datasets, our approach also shows good results. The proposed approach is deployed in the Monk search engine for historical-handwriting collections

    Handwriting style classification

    Get PDF
    This paper describes an independent handwriting style classifier that has been designed to select the best recognizer for a given style of writing. For this purpose a definition of handwriting legibility has been defined and a method implemented that can predict this legibility. The technique consists of two phases. In the feature-extraction phase, a set of 36 features is extracted from the image contour. In the classification phase, two nonparametric classification techniques are applied to the extracted features in order to compare their effectiveness in classifying words into legible, illegible, and middle classes. In the first method, a multiple discriminant analysis (MDA) is used to transform the space of extracted features (36 dimensions) into an optimal discriminant space for a nearest mean based classifier. In the second method, a probabilistic neural network (PNN) based on the Bayes strategy and nonparametric estimation of probability density function is used. The experimental results show that the PNN method gives superior classification results when compared with the MDA method. For the legible, illegible, and middle handwriting the method provides 86.5% (legible/illegible), 65.5% (legible/middle), and 90.5% (middle/illegible) correct classification for two classes. For the three-class legibility classification the rate of correct classification is 67.33% using a PNN classifier

    Spectral Graph-based Features for Recognition of Handwritten Characters: A Case Study on Handwritten Devanagari Numerals

    Full text link
    Interpretation of different writing styles, unconstrained cursiveness and relationship between different primitive parts is an essential and challenging task for recognition of handwritten characters. As feature representation is inadequate, appropriate interpretation/description of handwritten characters seems to be a challenging task. Although existing research in handwritten characters is extensive, it still remains a challenge to get the effective representation of characters in feature space. In this paper, we make an attempt to circumvent these problems by proposing an approach that exploits the robust graph representation and spectral graph embedding concept to characterise and effectively represent handwritten characters, taking into account writing styles, cursiveness and relationships. For corroboration of the efficacy of the proposed method, extensive experiments were carried out on the standard handwritten numeral Computer Vision Pattern Recognition, Unit of Indian Statistical Institute Kolkata dataset. The experimental results demonstrate promising findings, which can be used in future studies.Comment: 16 pages, 8 figure

    An investigation into the use of linguistic context in cursive script recognition by computer

    Get PDF
    The automatic recognition of hand-written text has been a goal for over thirty five years. The highly ambiguous nature of cursive writing (with high variability between not only different writers, but even between different samples from the same writer), means that systems based only on visual information are prone to errors. It is suggested that the application of linguistic knowledge to the recognition task may improve recognition accuracy. If a low-level (pattern recognition based) recogniser produces a candidate lattice (i.e. a directed graph giving a number of alternatives at each word position in a sentence), then linguistic knowledge can be used to find the 'best' path through the lattice. There are many forms of linguistic knowledge that may be used to this end. This thesis looks specifically at the use of collocation as a source of linguistic knowledge. Collocation describes the statistical tendency of certain words to co-occur in a language, within a defined range. It is suggested that this tendency may be exploited to aid automatic text recognition. The construction and use of a post-processing system incorporating collocational knowledge is described, as are a number of experiments designed to test the effectiveness of collocation as an aid to text recognition. The results of these experiments suggest that collocational statistics may be a useful form of knowledge for this application and that further research may produce a system of real practical use

    Deep Neural Network Architectures for Large-scale, Robust and Small-Footprint Speaker and Language Recognition

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura : 27-04-2017Artificial neural networks are powerful learners of the information embedded in speech signals. They can provide compact, multi-level, nonlinear representations of temporal sequences and holistic optimization algorithms capable of surpassing former leading paradigms. Artificial neural networks are, therefore, a promising technology that can be used to enhance our ability to recognize speakers and languages–an ability increasingly in demand in the context of new, voice-enabled interfaces used today by millions of users. The aim of this thesis is to advance the state-of-the-art of language and speaker recognition through the formulation, implementation and empirical analysis of novel approaches for large-scale and portable speech interfaces. Its major contributions are: (1) novel, compact network architectures for language and speaker recognition, including a variety of network topologies based on fully-connected, recurrent, convolutional, and locally connected layers; (2) a bottleneck combination strategy for classical and neural network approaches for long speech sequences; (3) the architectural design of the first, public, multilingual, large vocabulary continuous speech recognition system; and (4) a novel, end-to-end optimization algorithm for text-dependent speaker recognition that is applicable to a range of verification tasks. Experimental results have demonstrated that artificial neural networks can substantially reduce the number of model parameters and surpass the performance of previous approaches to language and speaker recognition, particularly in the cases of long short-term memory recurrent networks (used to model the input speech signal), end-to-end optimization algorithms (used to predict languages or speakers), short testing utterances, and large training data collections.Las redes neuronales artificiales son sistemas de aprendizaje capaces de extraer la información embebida en las señales de voz. Son capaces de modelar de forma eficiente secuencias temporales complejas, con información no lineal y distribuida en distintos niveles semanticos, mediante el uso de algoritmos de optimización integral con la capacidad potencial de mejorar los sistemas aprendizaje automático existentes. Las redes neuronales artificiales son, pues, una tecnología prometedora para mejorar el reconocimiento automático de locutores e idiomas; siendo el reconocimiento de de locutores e idiomas, tareas con cada vez más demanda en los nuevos sistemas de control por voz, que ya utilizan millones de personas. Esta tesis tiene como objetivo la mejora del estado del arte de las tecnologías de reconocimiento de locutor y de idioma mediante la formulación, implementación y análisis empírico de nuevos enfoques basados en redes neuronales, aplicables a dispositivos portátiles y a su uso en gran escala. Las principales contribuciones de esta tesis incluyen la propuesta original de: (1) arquitecturas eficientes que hacen uso de capas neuronales densas, localmente densas, recurrentes y convolucionales; (2) una nueva estrategia de combinación de enfoques clásicos y enfoques basados en el uso de las denominadas redes de cuello de botella; (3) el diseño del primer sistema público de reconocimiento de voz, de vocabulario abierto y continuo, que es además multilingüe; y (4) la propuesta de un nuevo algoritmo de optimización integral para tareas de reconocimiento de locutor, aplicable también a otras tareas de verificación. Los resultados experimentales extraídos de esta tesis han demostrado que las redes neuronales artificiales son capaces de reducir el número de parámetros usados por los algoritmos de reconocimiento tradicionales, así como de mejorar el rendimiento de dichos sistemas de forma substancial. Dicha mejora relativa puede acentuarse a través del modelado de voz mediante redes recurrentes de memoria a largo plazo, el uso de algoritmos de optimización integral, el uso de locuciones de evaluation de corta duración y mediante la optimización del sistema con grandes cantidades de datos de entrenamiento

    Multi-modal post-editing of machine translation

    Get PDF
    As MT quality continues to improve, more and more translators switch from traditional translation from scratch to PE of MT output, which has been shown to save time and reduce errors. Instead of mainly generating text, translators are now asked to correct errors within otherwise helpful translation proposals, where repetitive MT errors make the process tiresome, while hard-to-spot errors make PE a cognitively demanding activity. Our contribution is three-fold: first, we explore whether interaction modalities other than mouse and keyboard could well support PE by creating and testing the MMPE translation environment. MMPE allows translators to cross out or hand-write text, drag and drop words for reordering, use spoken commands or hand gestures to manipulate text, or to combine any of these input modalities. Second, our interviews revealed that translators see value in automatically receiving additional translation support when a high CL is detected during PE. We therefore developed a sensor framework using a wide range of physiological and behavioral data to estimate perceived CL and tested it in three studies, showing that multi-modal, eye, heart, and skin measures can be used to make translation environments cognition-aware. Third, we present two multi-encoder Transformer architectures for APE and discuss how these can adapt MT output to a domain and thereby avoid correcting repetitive MT errors.Angesichts der stetig steigenden Qualität maschineller Übersetzungssysteme (MÜ) post-editieren (PE) immer mehr Übersetzer die MÜ-Ausgabe, was im Vergleich zur herkömmlichen Übersetzung Zeit spart und Fehler reduziert. Anstatt primär Text zu generieren, müssen Übersetzer nun Fehler in ansonsten hilfreichen Übersetzungsvorschlägen korrigieren. Dennoch bleibt die Arbeit durch wiederkehrende MÜ-Fehler mühsam und schwer zu erkennende Fehler fordern die Übersetzer kognitiv. Wir tragen auf drei Ebenen zur Verbesserung des PE bei: Erstens untersuchen wir, ob andere Interaktionsmodalitäten als Maus und Tastatur das PE unterstützen können, indem wir die Übersetzungsumgebung MMPE entwickeln und testen. MMPE ermöglicht es, Text handschriftlich, per Sprache oder über Handgesten zu verändern, Wörter per Drag & Drop neu anzuordnen oder all diese Eingabemodalitäten zu kombinieren. Zweitens stellen wir ein Sensor-Framework vor, das eine Vielzahl physiologischer und verhaltensbezogener Messwerte verwendet, um die kognitive Last (KL) abzuschätzen. In drei Studien konnten wir zeigen, dass multimodale Messung von Augen-, Herz- und Hautmerkmalen verwendet werden kann, um Übersetzungsumgebungen an die KL der Übersetzer anzupassen. Drittens stellen wir zwei Multi-Encoder-Transformer-Architekturen für das automatische Post-Editieren (APE) vor und erörtern, wie diese die MÜ-Ausgabe an eine Domäne anpassen und dadurch die Korrektur von sich wiederholenden MÜ-Fehlern vermeiden können.Deutsche Forschungsgemeinschaft (DFG), Projekt MMP

    Irish Machine Vision and Image Processing Conference Proceedings 2017

    Get PDF

    Systematic literature review of hand gestures used in human computer interaction interfaces

    Get PDF
    Gestures, widely accepted as a humans' natural mode of interaction with their surroundings, have been considered for use in human-computer based interfaces since the early 1980s. They have been explored and implemented, with a range of success and maturity levels, in a variety of fields, facilitated by a multitude of technologies. Underpinning gesture theory however focuses on gestures performed simultaneously with speech, and majority of gesture based interfaces are supported by other modes of interaction. This article reports the results of a systematic review undertaken to identify characteristics of touchless/in-air hand gestures used in interaction interfaces. 148 articles were reviewed reporting on gesture-based interaction interfaces, identified through searching engineering and science databases (Engineering Village, Pro Quest, Science Direct, Scopus and Web of Science). The goal of the review was to map the field of gesture-based interfaces, investigate the patterns in gesture use, and identify common combinations of gestures for different combinations of applications and technologies. From the review, the community seems disparate with little evidence of building upon prior work and a fundamental framework of gesture-based interaction is not evident. However, the findings can help inform future developments and provide valuable information about the benefits and drawbacks of different approaches. It was further found that the nature and appropriateness of gestures used was not a primary factor in gesture elicitation when designing gesture based systems, and that ease of technology implementation often took precedence
    corecore