241 research outputs found

    Contributions to Pen & Touch Human-Computer Interaction

    Full text link
    [EN] Computers are now present everywhere, but their potential is not fully exploited due to some lack of acceptance. In this thesis, the pen computer paradigm is adopted, whose main idea is to replace all input devices by a pen and/or the fingers, given that the origin of the rejection comes from using unfriendly interaction devices that must be replaced by something easier for the user. This paradigm, that was was proposed several years ago, has been only recently fully implemented in products, such as the smartphones. But computers are actual illiterates that do not understand gestures or handwriting, thus a recognition step is required to "translate" the meaning of these interactions to computer-understandable language. And for this input modality to be actually usable, its recognition accuracy must be high enough. In order to realistically think about the broader deployment of pen computing, it is necessary to improve the accuracy of handwriting and gesture recognizers. This thesis is devoted to study different approaches to improve the recognition accuracy of those systems. First, we will investigate how to take advantage of interaction-derived information to improve the accuracy of the recognizer. In particular, we will focus on interactive transcription of text images. Here the system initially proposes an automatic transcript. If necessary, the user can make some corrections, implicitly validating a correct part of the transcript. Then the system must take into account this validated prefix to suggest a suitable new hypothesis. Given that in such application the user is constantly interacting with the system, it makes sense to adapt this interactive application to be used on a pen computer. User corrections will be provided by means of pen-strokes and therefore it is necessary to introduce a recognizer in charge of decoding this king of nondeterministic user feedback. However, this recognizer performance can be boosted by taking advantage of interaction-derived information, such as the user-validated prefix. Then, this thesis focuses on the study of human movements, in particular, hand movements, from a generation point of view by tapping into the kinematic theory of rapid human movements and the Sigma-Lognormal model. Understanding how the human body generates movements and, particularly understand the origin of the human movement variability, is important in the development of a recognition system. The contribution of this thesis to this topic is important, since a new technique (which improves the previous results) to extract the Sigma-lognormal model parameters is presented. Closely related to the previous work, this thesis study the benefits of using synthetic data as training. The easiest way to train a recognizer is to provide "infinite" data, representing all possible variations. In general, the more the training data, the smaller the error. But usually it is not possible to infinitely increase the size of a training set. Recruiting participants, data collection, labeling, etc., necessary for achieving this goal can be time-consuming and expensive. One way to overcome this problem is to create and use synthetically generated data that looks like the human. We study how to create these synthetic data and explore different approaches on how to use them, both for handwriting and gesture recognition. The different contributions of this thesis have obtained good results, producing several publications in international conferences and journals. Finally, three applications related to the work of this thesis are presented. First, we created Escritorie, a digital desk prototype based on the pen computer paradigm for transcribing handwritten text images. Second, we developed "Gestures à Go Go", a web application for bootstrapping gestures. Finally, we studied another interactive application under the pen computer paradigm. In this case, we study how translation reviewing can be done more ergonomically using a pen.[ES] Hoy en día, los ordenadores están presentes en todas partes pero su potencial no se aprovecha debido al "miedo" que se les tiene. En esta tesis se adopta el paradigma del pen computer, cuya idea fundamental es sustituir todos los dispositivos de entrada por un lápiz electrónico o, directamente, por los dedos. El origen del rechazo a los ordenadores proviene del uso de interfaces poco amigables para el humano. El origen de este paradigma data de hace más de 40 años, pero solo recientemente se ha comenzado a implementar en dispositivos móviles. La lenta y tardía implantación probablemente se deba a que es necesario incluir un reconocedor que "traduzca" los trazos del usuario (texto manuscrito o gestos) a algo entendible por el ordenador. Para pensar de forma realista en la implantación del pen computer, es necesario mejorar la precisión del reconocimiento de texto y gestos. El objetivo de esta tesis es el estudio de diferentes estrategias para mejorar esta precisión. En primer lugar, esta tesis investiga como aprovechar información derivada de la interacción para mejorar el reconocimiento, en concreto, en la transcripción interactiva de imágenes con texto manuscrito. En la transcripción interactiva, el sistema y el usuario trabajan "codo con codo" para generar la transcripción. El usuario valida la salida del sistema proporcionando ciertas correcciones, mediante texto manuscrito, que el sistema debe tener en cuenta para proporcionar una mejor transcripción. Este texto manuscrito debe ser reconocido para ser utilizado. En esta tesis se propone aprovechar información contextual, como por ejemplo, el prefijo validado por el usuario, para mejorar la calidad del reconocimiento de la interacción. Tras esto, la tesis se centra en el estudio del movimiento humano, en particular del movimiento de las manos, utilizando la Teoría Cinemática y su modelo Sigma-Lognormal. Entender como se mueven las manos al escribir, y en particular, entender el origen de la variabilidad de la escritura, es importante para el desarrollo de un sistema de reconocimiento, La contribución de esta tesis a este tópico es importante, dado que se presenta una nueva técnica (que mejora los resultados previos) para extraer el modelo Sigma-Lognormal de trazos manuscritos. De forma muy relacionada con el trabajo anterior, se estudia el beneficio de utilizar datos sintéticos como entrenamiento. La forma más fácil de entrenar un reconocedor es proporcionar un conjunto de datos "infinito" que representen todas las posibles variaciones. En general, cuanto más datos de entrenamiento, menor será el error del reconocedor. No obstante, muchas veces no es posible proporcionar más datos, o hacerlo es muy caro. Por ello, se ha estudiado como crear y usar datos sintéticos que se parezcan a los reales. Las diferentes contribuciones de esta tesis han obtenido buenos resultados, produciendo varias publicaciones en conferencias internacionales y revistas. Finalmente, también se han explorado tres aplicaciones relaciones con el trabajo de esta tesis. En primer lugar, se ha creado Escritorie, un prototipo de mesa digital basada en el paradigma del pen computer para realizar transcripción interactiva de documentos manuscritos. En segundo lugar, se ha desarrollado "Gestures à Go Go", una aplicación web para generar datos sintéticos y empaquetarlos con un reconocedor de forma rápida y sencilla. Por último, se presenta un sistema interactivo real bajo el paradigma del pen computer. En este caso, se estudia como la revisión de traducciones automáticas se puede realizar de forma más ergonómica.[CA] Avui en dia, els ordinadors són presents a tot arreu i es comunament acceptat que la seva utilització proporciona beneficis. No obstant això, moltes vegades el seu potencial no s'aprofita totalment. En aquesta tesi s'adopta el paradigma del pen computer, on la idea fonamental és substituir tots els dispositius d'entrada per un llapis electrònic, o, directament, pels dits. Aquest paradigma postula que l'origen del rebuig als ordinadors prové de l'ús d'interfícies poc amigables per a l'humà, que han de ser substituïdes per alguna cosa més coneguda. Per tant, la interacció amb l'ordinador sota aquest paradigma es realitza per mitjà de text manuscrit i/o gestos. L'origen d'aquest paradigma data de fa més de 40 anys, però només recentment s'ha començat a implementar en dispositius mòbils. La lenta i tardana implantació probablement es degui al fet que és necessari incloure un reconeixedor que "tradueixi" els traços de l'usuari (text manuscrit o gestos) a alguna cosa comprensible per l'ordinador, i el resultat d'aquest reconeixement, actualment, és lluny de ser òptim. Per pensar de forma realista en la implantació del pen computer, cal millorar la precisió del reconeixement de text i gestos. L'objectiu d'aquesta tesi és l'estudi de diferents estratègies per millorar aquesta precisió. En primer lloc, aquesta tesi investiga com aprofitar informació derivada de la interacció per millorar el reconeixement, en concret, en la transcripció interactiva d'imatges amb text manuscrit. En la transcripció interactiva, el sistema i l'usuari treballen "braç a braç" per generar la transcripció. L'usuari valida la sortida del sistema donant certes correccions, que el sistema ha d'usar per millorar la transcripció. En aquesta tesi es proposa utilitzar correccions manuscrites, que el sistema ha de reconèixer primer. La qualitat del reconeixement d'aquesta interacció és millorada, tenint en compte informació contextual, com per exemple, el prefix validat per l'usuari. Després d'això, la tesi se centra en l'estudi del moviment humà en particular del moviment de les mans, des del punt de vista generatiu, utilitzant la Teoria Cinemàtica i el model Sigma-Lognormal. Entendre com es mouen les mans en escriure és important per al desenvolupament d'un sistema de reconeixement, en particular, per entendre l'origen de la variabilitat de l'escriptura. La contribució d'aquesta tesi a aquest tòpic és important, atès que es presenta una nova tècnica (que millora els resultats previs) per extreure el model Sigma- Lognormal de traços manuscrits. De forma molt relacionada amb el treball anterior, s'estudia el benefici d'utilitzar dades sintètiques per a l'entrenament. La forma més fàcil d'entrenar un reconeixedor és proporcionar un conjunt de dades "infinit" que representin totes les possibles variacions. En general, com més dades d'entrenament, menor serà l'error del reconeixedor. No obstant això, moltes vegades no és possible proporcionar més dades, o fer-ho és molt car. Per això, s'ha estudiat com crear i utilitzar dades sintètiques que s'assemblin a les reals. Les diferents contribucions d'aquesta tesi han obtingut bons resultats, produint diverses publicacions en conferències internacionals i revistes. Finalment, també s'han explorat tres aplicacions relacionades amb el treball d'aquesta tesi. En primer lloc, s'ha creat Escritorie, un prototip de taula digital basada en el paradigma del pen computer per realitzar transcripció interactiva de documents manuscrits. En segon lloc, s'ha desenvolupat "Gestures à Go Go", una aplicació web per a generar dades sintètiques i empaquetar-les amb un reconeixedor de forma ràpida i senzilla. Finalment, es presenta un altre sistema inter- actiu sota el paradigma del pen computer. En aquest cas, s'estudia com la revisió de traduccions automàtiques es pot realitzar de forma més ergonòmica.Martín-Albo Simón, D. (2016). Contributions to Pen & Touch Human-Computer Interaction [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/68482TESI

    A discrete contextual stochastic model for the off-line recognition of handwritten Chinese characters

    Get PDF
    We study a discrete contextual stochastic (CS) model for complex and variant patterns like handwritten Chinese characters. Three fundamental problems of using CS models for character recognition are discussed, and several practical techniques for solving these problems are investigated. A formulation for discriminative training of CS model parameters is also introduced and its practical usage investigated. To illustrate the characteristics of the various algorithms, comparative experiments are performed on a recognition task with a vocabulary consisting of 50 pairs of highly similar handwritten Chinese characters. The experimental results confirm the effectiveness of the discriminative training for improving recognition performance.published_or_final_versio

    Digit and command interpretation for electronic book using neural network and genetic algorithm

    Get PDF
    Centre for Multimedia Signal Processing, Department of Electronic and Information Engineering2004-2005 > Academic research: refereed > Publication in refereed journalVersion of RecordPublishe

    Towards robust real-world historical handwriting recognition

    Get PDF
    In this thesis, we make a bridge from the past to the future by using artificial-intelligence methods for text recognition in a historical Dutch collection of the Natuurkundige Commissie that explored Indonesia (1820-1850). In spite of the successes of systems like 'ChatGPT', reading historical handwriting is still quite challenging for AI. Whereas GPT-like methods work on digital texts, historical manuscripts are only available as an extremely diverse collections of (pixel) images. Despite the great results, current DL methods are very data greedy, time consuming, heavily dependent on the human expert from the humanities for labeling and require machine-learning experts for designing the models. Ideally, the use of deep learning methods should require minimal human effort, have an algorithm observe the evolution of the training process, and avoid inefficient use of the already sparse amount of labeled data. We present several approaches towards dealing with these problems, aiming to improve the robustness of current methods and to improve the autonomy in training. We applied our novel word and line text recognition approaches on nine data sets differing in time period, language, and difficulty: three locally collected historical Latin-based data sets from Naturalis, Leiden; four public Latin-based benchmark data sets for comparability with other approaches; and two Arabic data sets. Using ensemble voting of just five neural networks, a level of accuracy was achieved which required hundreds of neural networks in earlier studies. Moreover, we increased the speed of evaluation of each training epoch without the need of labeled data

    On the application of reservoir computing networks for noisy image recognition

    Get PDF
    Reservoir Computing Networks (RCNs) are a special type of single layer recurrent neural networks, in which the input and the recurrent connections are randomly generated and only the output weights are trained. Besides the ability to process temporal information, the key points of RCN are easy training and robustness against noise. Recently, we introduced a simple strategy to tune the parameters of RCNs. Evaluation in the domain of noise robust speech recognition proved that this method was effective. The aim of this work is to extend that study to the field of image processing, by showing that the proposed parameter tuning procedure is equally valid in the field of image processing and conforming that RCNs are apt at temporal modeling and are robust with respect to noise. In particular, we investigate the potential of RCNs in achieving competitive performance on the well-known MNIST dataset by following the aforementioned parameter optimizing strategy. Moreover, we achieve good noise robust recognition by utilizing such a network to denoise images and supplying them to a recognizer that is solely trained on clean images. The experiments demonstrate that the proposed RCN-based handwritten digit recognizer achieves an error rate of 0.81 percent on the clean test data of the MNIST benchmark and that the proposed RCN-based denoiser can effectively reduce the error rate on the various types of noise. (c) 2017 Elsevier B.V. All rights reserved

    Data display and analysis

    Get PDF
    Graphical character recognizer and data displa

    Decoupled Attention Network for Text Recognition

    Full text link
    Text recognition has attracted considerable research interests because of its various applications. The cutting-edge text recognition methods are based on attention mechanisms. However, most of attention methods usually suffer from serious alignment problem due to its recurrency alignment operation, where the alignment relies on historical decoding results. To remedy this issue, we propose a decoupled attention network (DAN), which decouples the alignment operation from using historical decoding results. DAN is an effective, flexible and robust end-to-end text recognizer, which consists of three components: 1) a feature encoder that extracts visual features from the input image; 2) a convolutional alignment module that performs the alignment operation based on visual features from the encoder; and 3) a decoupled text decoder that makes final prediction by jointly using the feature map and attention maps. Experimental results show that DAN achieves state-of-the-art performance on multiple text recognition tasks, including offline handwritten text recognition and regular/irregular scene text recognition.Comment: 9 pages, 8 figures, 6 tables, accepted by AAAI-202

    Fast Incremental Learning Strategy Driven by Confusion Reject for Online Handwriting Recognition

    No full text
    International audienceIn this paper, we present a new incremental learning strategy for handwritten character recognition systems. This learning strategy enables the recognition system to learn “rapidly” any new character from very few examples. The presented strategy is driven by a confusion detection mechanism in order to control the learning process. Artificial characters generation techniques are used to overcome the problem of lack of learning data when introducing a new character from unseen class. The results show that a good recognition rate (about 90%) is achieved after only 5 learning examples. Moreover, the rate quickly rises to 94% after 10 examples, and approximately 97% after 30 examples. A reduction of error of 40% is obtained by using the artificial characters generation techniques
    corecore