1,301 research outputs found

    Voice Activated Appliances for Severely Disabled Persons

    Get PDF

    DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation

    Full text link
    There is an undeniable communication barrier between deaf people and people with normal hearing ability. Although innovations in sign language translation technology aim to tear down this communication barrier, the majority of existing sign language translation systems are either intrusive or constrained by resolution or ambient lighting conditions. Moreover, these existing systems can only perform single-sign ASL translation rather than sentence-level translation, making them much less useful in daily-life communication scenarios. In this work, we fill this critical gap by presenting DeepASL, a transformative deep learning-based sign language translation technology that enables ubiquitous and non-intrusive American Sign Language (ASL) translation at both word and sentence levels. DeepASL uses infrared light as its sensing mechanism to non-intrusively capture the ASL signs. It incorporates a novel hierarchical bidirectional deep recurrent neural network (HB-RNN) and a probabilistic framework based on Connectionist Temporal Classification (CTC) for word-level and sentence-level ASL translation respectively. To evaluate its performance, we have collected 7,306 samples from 11 participants, covering 56 commonly used ASL words and 100 ASL sentences. DeepASL achieves an average 94.5% word-level translation accuracy and an average 8.2% word error rate on translating unseen ASL sentences. Given its promising performance, we believe DeepASL represents a significant step towards breaking the communication barrier between deaf people and hearing majority, and thus has the significant potential to fundamentally change deaf people's lives

    Integration of a voice recognition system in a social robot

    Get PDF
    Human-Robot Interaction (HRI) 1 is one of the main fields in the study and research of robotics. Within this field, dialog systems and interaction by voice play a very important role. When speaking about human- robot natural dialog we assume that the robot has the capability to accurately recognize the utterance what the human wants to transmit verbally and even its semantic meaning, but this is not always achieved. In this paper we describe the steps and requirements that we went through in order to endow the personal social robot Maggie, developed in the University Carlos III of Madrid, with the capability of understanding the natural language spoken by any human. We have analyzed the different possibilities offered by current software/hardware alternatives by testing them in real environments. We have obtained accurate data related to the speech recognition capabilities in different environments, using the most modern audio acquisition systems and analyzing not so typical parameters as user age, sex, intonation, volume and language. Finally we propose a new model to classify recognition results as accepted and rejected, based in a second ASR opinion. This new approach takes into account the pre-calculated success rate in noise intervals for each recognition framework decreasing false positives and false negatives rate.The funds have provided by the Spanish Government through the project called `Peer to Peer Robot-Human Interaction'' (R2H), of MEC (Ministry of Science and Education), and the project “A new approach to social robotics'' (AROS), of MICINN (Ministry of Science and Innovation). The research leading to these results has received funding from the RoboCity2030-II-CM project (S2009/DPI-1559), funded by Programas de Actividades I+D en la Comunidad de Madrid and cofunded by Structural Funds of the EU

    Discriminatively trained features using fMPE for multi-stream audio-visual speech recognition

    Get PDF
    Abstract fMPE is a recently introduced discriminative training technique that uses the Minimum Phone Error (MPE) discriminative criterion to train a feature-level transformation. In this paper we investigate fMPE trained audio/visual features for multistream HMM-based audio-visual speech recognition. A flexible, layer-based implementation of fMPE allows us to combine the the visual information with the audio stream using the discriminative traning process, and dispense with the multiple stream approach. Experiments are reported on the IBM infrared headset audio-visual database. On average of 20-speaker 1 hour speaker independent test data, the fMPE trained acoustic features achieve 33% relative gain. Adding video layers on top of audio layers gives additional 10% gain over fMPE trained features from the audio stream alone. The fMPE trained visual features achieve 14% relative gain, while the decision fusion of audio/visual streams with fMPE trained features achieves 29% relative gain. However, fMPE trained models do not improve over the original models on the mismatched noisy test data

    Desain Konseptual Speech Recognition di Komunikasi Pesawat untuk Mengurangi Kesalahan Komunikasi Penerbangan

    Full text link
    Penyebab utama kecelakaan penerbangan adalah human error (55%) dengan salah satu penyebab adalah miskomunikasi. Miskomunikasi menjadi penyebab kasus kecelakaan terbesar di dunia seperti kecelakaan antara Pan Am dan KLM, Garuda dengan nomor penerbangan 152, dan kecelakaan yang baru-baru ini terjadi antara Batik Air dan Transnusa. Selain itu, KNKT pernah mengalami kebingungan saat menginvestigasi cockpit recorder Air Asia QZ8501. Saat miskomunikasi terjadi, instruksi sering kali sulit dipahami dan diperlukan pengulangan komunikasi, yang mempersempit waktu pengambilan tindakan. Penelitian ini ingin mengembangkan konsep speech recognition dengan sistem voice sign and text untuk prosedur pengecekan komunikasi kru penerbangan. Metode dalam penelitian ini adalah kajian pustaka dan perancangan sistem. Hasil dari penelitian ini adalah penambahan prosedur komunikasi dengan sign dan teks yang diperoleh dengan bantuan speech recognition. Pesan teks berasal dari ucapan yang diterjemahkan oleh speech recognition menjadi teks, kemudian teks akan diubah menjadi sign. Lalu sign dan teks akan ditampilkan dan dilihat langsung oleh pilot. Dengan penambahan sistem sign dan teks disamping komunikasi melalui suara diharapkan kesalahan dapat diminimalisir, pengambilan keputusan dapat dilakukan dengan cepat, dan kecelakaan dapat dihindari. [Speech Recognition Conceptual Design in Aircraft Communications to Reduce Flight Communication Mistake] The main cause of aviation accidents is the human error (55%) in which one of its factor is the miscommunication. Miscommunication involved in the most accidents in the world including Pan Am and KLM accident, Garuda flight 152 (the biggest accident in Indonesia), and that was recently occurred between Batik Air and Transnusa Airline. Moreover, NTSC was confused when investigating the Air Asia QZ8501 cockpit recorder. When miscommunication occurs, the repetition of communication is prominently required due to the difficulty in understanding the instruction, which later narrowed the time to take any action. This study is intended to develop the concept of speech recognition with voice sign and text system for flight crew communication checking procedure. The methodology that is used in this research is the combining of literature review and system design. The results of this study is the addition of communication procedure by means of sign and text that obtained from speech recognition process. The text is produced from the translation of voice by speech recognition and converted into sign afterwards. Both of sign and text will be displayed, thus can be seen by the pilot. In addition to communication by voice, the implementation of sign and text is expected in minimalizing the error, supporting faster decision making, and avoiding the accident

    Hands-Free Gesture and Voice Control for System Interfacing

    Get PDF
    The proposed system presents a simple prototype system for real-time tracking of a human head and speech recognition for hands- free mouse. This system uses a simple yet an effective Face tracking algorithm. The Haar-classifier algorithm is used to capture the frames of the face and Lucas-Kanade algorithm for marking the features of a human face. The general requirements of a real-time tracking algorithm ? it should be computationally economical, should possess the capability to perform in diverse environments and should be able to run itself with a very minimal knowledge about the preexistence of the faces in the head tracking algorithm. This system also makes use of Microsoft Speech SDK 5.1 for speech recognition. It is composed of two fundamental components ? voice recognizer and speech synthesizer. The voice recognizer is used to capture the input of voice signals and speech synthesizer is responsible for lexicon management

    A HoloLens Application to Aid People who are Visually Impaired in Navigation Tasks

    Get PDF
    Day-to-day activities such as navigation and reading can be particularly challenging for people with visual impairments. Reading text on signs may be especially difficult for people who are visually impaired because signs have variable color, contrast, and size. Indoors, signage may include office, classroom, restroom, and fire evacuation signs. Outdoors, they may include street signs, bus numbers, and store signs. Depending on the level of visual impairment, just identifying where signs exist can be a challenge. Using Microsoft\u27s HoloLens, an augmented reality device, I designed and implemented the TextSpotting application that helps those with low vision identify and read indoor signs so that they can navigate text-heavy environments. The application can provide both visual information and auditory information. In addition to developing the application, I conducted a user study to test its effectiveness. Participants were asked to find a room in an unfamiliar hallway. Those that used the TextSpotting application completed the task less quickly yet reported higher levels of ease, comfort, and confidence, indicating the application\u27s limitations and potential in providing an effective means to navigate unknown environments via signage

    A Situative Space Model for Mobile Mixed-Reality Computing

    Get PDF
    • …
    corecore