1,388 research outputs found

    SLU FOR VOICE COMMAND IN SMART HOME: COMPARISON OF PIPELINE AND END-TO-END APPROACHES

    Get PDF
    International audienceSpoken Language Understanding (SLU) is typically performedthrough automatic speech recognition (ASR) andnatural language understanding (NLU) in a pipeline. However,errors at the ASR stage have a negative impact on theNLU performance. Hence, there is a rising interest in End-to-End (E2E) SLU to jointly perform ASR and NLU. AlthoughE2E models have shown superior performance to modularapproaches in many NLP tasks, current SLU E2E modelshave still not definitely superseded pipeline approaches.In this paper, we present a comparison of the pipelineand E2E approaches for the task of voice command in smarthomes. Since there are no large non-English domain-specificdata sets available, although needed for an E2E model, wetackle the lack of such data by combining Natural LanguageGeneration (NLG) and text-to-speech (TTS) to generateFrench training data. The trained models were evaluatedon voice commands acquired in a real smart home with severalspeakers. Results show that the E2E approach can reachperformances similar to a state-of-the art pipeline SLU despitea higher WER than the pipeline approach. Furthermore,the E2E model can benefit from artificially generated data toexhibit lower Concept Error Rates than the pipeline baselinefor slot recognition

    ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications

    Full text link
    Personal assistants, automatic speech recognizers and dialogue understanding systems are becoming more critical in our interconnected digital world. A clear example is air traffic control (ATC) communications. ATC aims at guiding aircraft and controlling the airspace in a safe and optimal manner. These voice-based dialogues are carried between an air traffic controller (ATCO) and pilots via very-high frequency radio channels. In order to incorporate these novel technologies into ATC (low-resource domain), large-scale annotated datasets are required to develop the data-driven AI systems. Two examples are automatic speech recognition (ASR) and natural language understanding (NLU). In this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging ATC field, which has lagged behind due to lack of annotated data. The ATCO2 corpus covers 1) data collection and pre-processing, 2) pseudo-annotations of speech data, and 3) extraction of ATC-related named entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set corpus contains 4 hours of ATC speech with manual transcripts and a subset with gold annotations for named-entity recognition (callsign, command, value). 2) The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched with automatic transcripts from an in-domain speech recognizer, contextual information, speaker turn information, signal-to-noise ratio estimate and English language detection score per sample. Both available for purchase through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. 3) The ATCO2-test-set-1h corpus is a one-hour subset from the original test set corpus, that we are offering for free at https://www.atco2.org/data. We expect the ATCO2 corpus will foster research on robust ASR and NLU not only in the field of ATC communications but also in the general research community.Comment: Manuscript under review; The code will be available at https://github.com/idiap/atco2-corpu

    Evaluation of a context-aware voice interface for Ambient Assisted Living: qualitative user study vs. quantitative system evaluation

    No full text
    International audienceThis paper presents an experiment with seniors and people with visual impairment in a voice-controlled smart home using the SWEET-HOME system. The experiment shows some weaknesses in automatic speech recognition which must be addressed, as well as the need of better adaptation to the user and the environment. Indeed, users were disturbed by the rigid structure of the grammar and were eager to adapt it to their own preferences. Surprisingly, while no humanoid aspect was introduced in the system, the senior participants were inclined to embody the system. Despite these aspects to improve, the system has been favourably assessed as diminishing most participant fears related to the loss of autonomy

    Developing a Speech-Based Interface for Field Data Collection

    Get PDF
    This work explores the use of speech as an interface for efficient field data collection. This exploration was conducted using the open source data collection application Field Book as a platform. The augmented interface was created using PocketSphinx speech recognition and Android text to speech to enable hands-free operation of a small scope of commands. At the completion of this work, the interface concept holds promise, but has some practical limitations that need to be addressed prior to effective use

    Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding

    Full text link
    Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring safe and efficient air traffic control (ATC). This task requires high levels of awareness from ATCos and can be tedious and error-prone. Recent attempts have been made to integrate artificial intelligence (AI) into ATC in order to reduce the workload of ATCos. However, the development of data-driven AI systems for ATC demands large-scale annotated datasets, which are currently lacking in the field. This paper explores the lessons learned from the ATCO2 project, a project that aimed to develop a unique platform to collect and preprocess large amounts of ATC data from airspace in real time. Audio and surveillance data were collected from publicly accessible radio frequency channels with VHF receivers owned by a community of volunteers and later uploaded to Opensky Network servers, which can be considered an "unlimited source" of data. In addition, this paper reviews previous work from ATCO2 partners, including (i) robust automatic speech recognition, (ii) natural language processing, (iii) English language identification of ATC communications, and (iv) the integration of surveillance data such as ADS-B. We believe that the pipeline developed during the ATCO2 project, along with the open-sourcing of its data, will encourage research in the ATC field. A sample of the ATCO2 corpus is available on the following website: https://www.atco2.org/data, while the full corpus can be purchased through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. We demonstrated that ATCO2 is an appropriate dataset to develop ASR engines when little or near to no ATC in-domain data is available. For instance, with the CNN-TDNNf kaldi model, we reached the performance of as low as 17.9% and 24.9% WER on public ATC datasets which is 6.6/7.6% better than "out-of-domain" but supervised CNN-TDNNf model.Comment: Manuscript under revie

    Visualization and Human-Machine Interaction

    Get PDF
    The digital age offers a lot of challenges in the eld of visualization. Visual imagery has been effectively used to communicate messages through the ages, to express both abstract and concrete ideas. Today, visualization has ever-expanding applications in science, engineering, education, medicine, entertainment and many other areas. Different areas of research contribute to the innovation in the eld of interactive visualization, such as data science, visual technology, Internet of things and many more. Among them, two areas of renowned importance are Augmented Reality and Visual Analytics. This thesis presents my research in the fields of visualization and human-machine interaction. The purpose of the proposed work is to investigate existing solutions in the area of Augmented Reality (AR) for maintenance. A smaller section of this thesis presents a minor research project on an equally important theme, Visual Analytics. Overall, the main goal is to identify the most important existing problems and then design and develop innovative solutions to address them. The maintenance application domain has been chosen since it is historically one of the first fields of application for Augmented Reality and it offers all the most common and important challenges that AR can arise, as described in chapter 2. Since one of the main problem in AR application deployment is reconfigurability of the application, a framework has been designed and developed that allows the user to create, deploy and update in real-time AR applications. Furthermore, the research focused on the problems related to hand-free interaction, thus investigating the area of speech-recognition interfaces and designing innovative solutions to address the problems of intuitiveness and robustness of the interface. On the other hand, the area of Visual Analytics has been investigated: among the different areas of research, multidimensional data visualization, similarly to AR, poses specific problems related to the interaction between the user and the machine. An analysis of the existing solutions has been carried out in order to identify their limitations and to point out possible improvements. Since this analysis delineates the scatterplot as a renowned visualization tool worthy of further research, different techniques for adapting its usage to multidimensional data are analyzed. A multidimensional scatterplot has been designed and developed in order to perform a comparison with another multidimensional visualization tool, the ScatterDice. The first chapters of my thesis describe my investigations in the area of Augmented Reality for maintenance. Chapter 1 provides definitions for the most important terms and an introduction to AR. The second chapter focuses on maintenance, depicting the motivations that led to choose this application domain. Moreover, the analysis concerning open problems and related works is described along with the methodology adopted to design and develop the proposed solutions. The third chapter illustrates how the adopted methodology has been applied in order to assess the problems described in the previous one. Chapter 4 describes the methodology adopted to carry out the tests and outlines the experimental results, whereas the fifth chapter illustrates the conclusions and points out possible future developments. Chapter 6 describes the analysis and research work performed in the eld of Visual Analytics, more specifically on multidimensional data visualizations. Overall, this thesis illustrates how the proposed solutions address common problems of visualization and human-machine interaction, such as interface de- sign, robustness of the interface and acceptance of new technology, whereas other problems are related to the specific research domain, such as pose tracking and reconfigurability of the procedure for the AR domain
    • …
    corecore