6,866 research outputs found

    DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation

    Full text link
    There is an undeniable communication barrier between deaf people and people with normal hearing ability. Although innovations in sign language translation technology aim to tear down this communication barrier, the majority of existing sign language translation systems are either intrusive or constrained by resolution or ambient lighting conditions. Moreover, these existing systems can only perform single-sign ASL translation rather than sentence-level translation, making them much less useful in daily-life communication scenarios. In this work, we fill this critical gap by presenting DeepASL, a transformative deep learning-based sign language translation technology that enables ubiquitous and non-intrusive American Sign Language (ASL) translation at both word and sentence levels. DeepASL uses infrared light as its sensing mechanism to non-intrusively capture the ASL signs. It incorporates a novel hierarchical bidirectional deep recurrent neural network (HB-RNN) and a probabilistic framework based on Connectionist Temporal Classification (CTC) for word-level and sentence-level ASL translation respectively. To evaluate its performance, we have collected 7,306 samples from 11 participants, covering 56 commonly used ASL words and 100 ASL sentences. DeepASL achieves an average 94.5% word-level translation accuracy and an average 8.2% word error rate on translating unseen ASL sentences. Given its promising performance, we believe DeepASL represents a significant step towards breaking the communication barrier between deaf people and hearing majority, and thus has the significant potential to fundamentally change deaf people's lives

    Advanced engineering - Supporting research and technology

    Get PDF
    Telemetry simulations, radar equipment and experiments, and related supporting research for Deep Space Networ

    A physiologically inspired model for solving the cocktail party problem.

    Get PDF
    At a cocktail party, we can broadly monitor the entire acoustic scene to detect important cues (e.g., our names being called, or the fire alarm going off), or selectively listen to a target sound source (e.g., a conversation partner). It has recently been observed that individual neurons in the avian field L (analog to the mammalian auditory cortex) can display broad spatial tuning to single targets and selective tuning to a target embedded in spatially distributed sound mixtures. Here, we describe a model inspired by these experimental observations and apply it to process mixtures of human speech sentences. This processing is realized in the neural spiking domain. It converts binaural acoustic inputs into cortical spike trains using a multi-stage model composed of a cochlear filter-bank, a midbrain spatial-localization network, and a cortical network. The output spike trains of the cortical network are then converted back into an acoustic waveform, using a stimulus reconstruction technique. The intelligibility of the reconstructed output is quantified using an objective measure of speech intelligibility. We apply the algorithm to single and multi-talker speech to demonstrate that the physiologically inspired algorithm is able to achieve intelligible reconstruction of an "attended" target sentence embedded in two other non-attended masker sentences. The algorithm is also robust to masker level and displays performance trends comparable to humans. The ideas from this work may help improve the performance of hearing assistive devices (e.g., hearing aids and cochlear implants), speech-recognition technology, and computational algorithms for processing natural scenes cluttered with spatially distributed acoustic objects.R01 DC000100 - NIDCD NIH HHSPublished versio

    Data display and analysis

    Get PDF
    Graphical character recognizer and data displa

    Machine learning approaches to video activity recognition: from computer vision to signal processing

    Get PDF
    244 p.La investigación presentada se centra en técnicas de clasificación para dos tareas diferentes, aunque relacionadas, de tal forma que la segunda puede ser considerada parte de la primera: el reconocimiento de acciones humanas en vídeos y el reconocimiento de lengua de signos.En la primera parte, la hipótesis de partida es que la transformación de las señales de un vídeo mediante el algoritmo de Patrones Espaciales Comunes (CSP por sus siglas en inglés, comúnmente utilizado en sistemas de Electroencefalografía) puede dar lugar a nuevas características que serán útiles para la posterior clasificación de los vídeos mediante clasificadores supervisados. Se han realizado diferentes experimentos en varias bases de datos, incluyendo una creada durante esta investigación desde el punto de vista de un robot humanoide, con la intención de implementar el sistema de reconocimiento desarrollado para mejorar la interacción humano-robot.En la segunda parte, las técnicas desarrolladas anteriormente se han aplicado al reconocimiento de lengua de signos, pero además de ello se propone un método basado en la descomposición de los signos para realizar el reconocimiento de los mismos, añadiendo la posibilidad de una mejor explicabilidad. El objetivo final es desarrollar un tutor de lengua de signos capaz de guiar a los usuarios en el proceso de aprendizaje, dándoles a conocer los errores que cometen y el motivo de dichos errores

    Neuronal Correlates of Diacritics and an Optimization Algorithm for Brain Mapping and Detecting Brain Function by way of Functional Magnetic Resonance Imaging

    Get PDF
    The purpose of this thesis is threefold: 1) A behavioral examination of the role of diacritics in Arabic, 2) A functional magnetic resonance imaging (fMRI) investigative study of diacritics in Arabic, and 3) An optimization algorithm for brain mapping and detecting brain function. Firstly, the role of diacritics in Arabic was examined behaviorally. The stimulus was a lexical decision task (LDT) that constituted of low, mid, and high frequency words and nonwords; with and without diacritics. Results showed that the presence of vowel diacritics slowed reaction time but did not affect word recognition accuracy. The longer reaction times for words with diacritics versus without diacritics suggest that the diacritics may contribute to differences in word recognition strategies. Secondly, an Event-related fMRI experiment of lexical decisions associated with real words with versus without diacritics in Arabic readers was done. Real words with no diacritics yielded shorter response times and stronger activation than with real words with diacritics in the hippocampus and middle temporal gyrus possibly reflecting a search from among multiple meanings associated with these words in a semantic store. In contrast, real words with diacritics had longer response times than real words without diacritics and activated the insula and frontal areas suggestive of phonological and semantic mediation in lexical retrieval. Both the behavioral and fMRI results in this study appear to support a role for diacritics in reading in Arabic. The third research work in this thesis is an optimization algorithm for fMRI data analysis. Current data-driven approaches for fMRI data analysis, such as independent component analysis (ICA), rely on algorithms that may have low computational expense, but are much more prone to suboptimal results. In this work, a genetic algorithm (GA) based on a clustering technique was designed, developed, and implemented for fMRI ICA data analysis. Results for the algorithm, GAICA, showed that although it might be computationally expensive; it provides global optimum convergence and results. Therefore, GAICA can be used as a complimentary or supplementary technique for brain mapping and detecting brain function by way of fMRI

    Artificial Intelligence for Multimedia Signal Processing

    Get PDF
    Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining
    corecore