80 research outputs found

    Human Reading Based Strategies for off-line Arabic Word Recognition

    Get PDF
    International audienceThis paper summarizes some techniques proposed for off-line Arabic word recognition. The point of view developed here concerns the human reading favoring an interactive mechanism between global memorization and local checking making easier the recognition of complex scripts as Arabic. According to this consideration, some specific papers are analyzed and their strategies commente

    Couplage d'une vision locale par hmm et globale par RN pour la reconnaissance de mots manuscrits

    Get PDF
    Colloque avec actes et comité de lecture. internationale.International audienceNous présentons dans cet article une idée de combinaison de modèles à visions locale et globale. Séparément, ces deux types de modèles ont prouvé leurs capacités, et leur combinaison a été exploitée en utilisant les modèles globaux plutôt localement, et les modèles locaux pour synthétiser leurs résultats. Nous proposons une démarche inverse en utilisant les modèles locaux pour normaliser efficacement les images de manière non linéaire, et les modèles globaux pour assurer une cohérence sur l'ensemble de la forme. Les modèles locaux sont de type markovien, et les modèles globaux sont des réseaux de neurones. Le gain de la normalisation markovienne est de l'ordre de 3% par rapport à une normalisation linéaire classique (84.2%). Le gain apporté par la combinaison est de l'ordre de 2% par rapport à l'approche markovienne seule (85.3%

    Automatic Arabic Handwritten Check Recognition

    Get PDF

    Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

    Get PDF
    Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it is established that the proposed methods can be used to create substantially more accurate tools for linguistic processing, compared to both unsupervised methods and to recently proposed cross-lingual methods. The empirical support for this claim is particularly strong in the latter case; our models for syntactic dependency parsing and part-of-speech tagging achieve the hitherto best published results for a wide number of target languages, in the setting where no annotated training data is available in the target language

    Machine Learning

    Get PDF
    Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience

    A cognitive ego-vision system for interactive assistance

    Get PDF
    With increasing computational power and decreasing size, computers nowadays are already wearable and mobile. They become attendant of peoples' everyday life. Personal digital assistants and mobile phones equipped with adequate software gain a lot of interest in public, although the functionality they provide in terms of assistance is little more than a mobile databases for appointments, addresses, to-do lists and photos. Compared to the assistance a human can provide, such systems are hardly to call real assistants. The motivation to construct more human-like assistance systems that develop a certain level of cognitive capabilities leads to the exploration of two central paradigms in this work. The first paradigm is termed cognitive vision systems. Such systems take human cognition as a design principle of underlying concepts and develop learning and adaptation capabilities to be more flexible in their application. They are embodied, active, and situated. Second, the ego-vision paradigm is introduced as a very tight interaction scheme between a user and a computer system that especially eases close collaboration and assistance between these two. Ego-vision systems (EVS) take a user's (visual) perspective and integrate the human in the system's processing loop by means of a shared perception and augmented reality. EVSs adopt techniques of cognitive vision to identify objects, interpret actions, and understand the user's visual perception. And they articulate their knowledge and interpretation by means of augmentations of the user's own view. These two paradigms are studied as rather general concepts, but always with the goal in mind to realize more flexible assistance systems that closely collaborate with its users. This work provides three major contributions. First, a definition and explanation of ego-vision as a novel paradigm is given. Benefits and challenges of this paradigm are discussed as well. Second, a configuration of different approaches that permit an ego-vision system to perceive its environment and its user is presented in terms of object and action recognition, head gesture recognition, and mosaicing. These account for the specific challenges identified for ego-vision systems, whose perception capabilities are based on wearable sensors only. Finally, a visual active memory (VAM) is introduced as a flexible conceptual architecture for cognitive vision systems in general, and for assistance systems in particular. It adopts principles of human cognition to develop a representation for information stored in this memory. So-called memory processes continuously analyze, modify, and extend the content of this VAM. The functionality of the integrated system emerges from their coordinated interplay of these memory processes. An integrated assistance system applying the approaches and concepts outlined before is implemented on the basis of the visual active memory. The system architecture is discussed and some exemplary processing paths in this system are presented and discussed. It assists users in object manipulation tasks and has reached a maturity level that allows to conduct user studies. Quantitative results of different integrated memory processes are as well presented as an assessment of the interactive system by means of these user studies

    Adaptation of Speaker and Speech Recognition Methods for the Automatic Screening of Speech Disorders using Machine Learning

    Get PDF
    This PhD thesis presented methods for exploiting the non-verbal communication of individuals suffering from specific diseases or health conditions aiming to reach an automatic screening of them. More specifically, we employed one of the pillars of non-verbal communication, paralanguage, to explore techniques that could be utilized to model the speech of subjects. Paralanguage is a non-lexical component of communication that relies on intonation, pitch, speed of talking, and others, which can be processed and analyzed in an automatic manner. This is called Computational Paralinguistics, which can be defined as the study of modeling non-verbal latent patterns within the speech of a speaker by means of computational algorithms; these patterns go beyond the linguistic} approach. By means of machine learning, we present models from distinct scenarios of both paralinguistics and pathological speech which are capable of estimating the health status of a given disease such as Alzheimer's, Parkinson's, and clinical depression, among others, in an automatic manner
    corecore