4,477 research outputs found

    A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

    Full text link
    Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to speed up inference, has attracted much attention in both machine learning and natural language processing communities. While NAR generation can significantly accelerate inference speed for machine translation, the speedup comes at the cost of sacrificed translation accuracy compared to its counterpart, auto-regressive (AR) generation. In recent years, many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation. In this paper, we conduct a systematic survey with comparisons and discussions of various non-autoregressive translation (NAT) models from different aspects. Specifically, we categorize the efforts of NAT into several groups, including data manipulation, modeling methods, training criterion, decoding algorithms, and the benefit from pre-trained models. Furthermore, we briefly review other applications of NAR models beyond machine translation, such as dialogue generation, text summarization, grammar error correction, semantic parsing, speech synthesis, and automatic speech recognition. In addition, we also discuss potential directions for future exploration, including releasing the dependency of KD, dynamic length prediction, pre-training for NAR, and wider applications, etc. We hope this survey can help researchers capture the latest progress in NAR generation, inspire the design of advanced NAR models and algorithms, and enable industry practitioners to choose appropriate solutions for their applications. The web page of this survey is at \url{https://github.com/LitterBrother-Xiao/Overview-of-Non-autoregressive-Applications}.Comment: 25 pages, 11 figures, 4 table

    A multimodal conversational coach for active ageing based on sentient computing and m-health

    Get PDF
    As Life Expectancy Increases, It Has Become More Necessary To Find Ways To Support Healthy Ageing. A Number Of Active Ageing Initiatives Are Being Developed Nowadays To Foster Healthy Habits In The Population. This Paper Presents Our Contribution To These Initiatives In The Form Of A Multimodal Conversational Coach That Acts As A Coach For Physical Activities. The Agent Can Be Developed As An Android App Running On Smartphones And Coupled With Cheap Widely Available Sport Sensors In Order To Provide Meaningful Coaching. It Can Be Employed To Prepare Exercise Sessions, Provide Feedback During The Sessions, And Discuss The Results After The Exercise. It Incorporates An Affective Component That Informs Dynamic User Models To Produce Adaptive Interaction Strategies.Spanish project, Grant/Award Number:TEC2017-88048-C2-2-R and TRA2016-78886-C3-1-

    Prototyping a Chatbot for Student Supervision in a Pre-registration Process

    Get PDF
    Developing a chatbot becomes a challenging task when it is built from scratch and independent of any Software as a Service (SaaS). Inspired by the idea of freeing lecturers from the burden of answering the same questions repetitively during the pre-registration process, this research has succeeded in building a textbased chatbot system. Further, this research has proved that the combination of keyword spotting technique for the Language Understanding component, Finite-State Transducer (FST) for the Dialogue Management, rulebased keyword matching for language generation, and the system-in-the-loop paradigm for system validation can produce an efficient chatbot. The chatbot efficiency is high enough as its score on Concept Efficiency (CE) reaches 0.946. It shows that users do not need to repeat their utterances several times to be understood. The chatbot performance on recognizing new concepts introduced by users is also more than satisfactory which is presented by its Query Density (QD) score of 0.80

    Dialogue Management and Language Generation for a Robust Conversational Virtual Coach: Validation and User Study

    Get PDF
    Designing human–machine interactive systems requires cooperation between different disciplines is required. In this work, we present a Dialogue Manager and a Language Generator that are the core modules of a Voice-based Spoken Dialogue System (SDS) capable of carrying out challenging, long and complex coaching conversations. We also develop an efficient integration procedure of the whole system that will act as an intelligent and robust Virtual Coach. The coaching task significantly differs from the classical applications of SDSs, resulting in a much higher degree of complexity and difficulty. The Virtual Coach has been successfully tested and validated in a user study with independent elderly, in three different countries with three different languages and cultures: Spain, France and Norway.The research presented in this paper has been conducted as part of the project EMPATHIC that has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant No. 769872. Additionally, this work has been partially funded by projects BEWORD and AMIC-PC of the Minister of Science of Technology, under Grant Nos. PID2021-126061OB-C42 and PDC2021-120846-C43, respectively. Vázquez and López Zorrilla received a PhD scholarship from the Basque Government, with Grant Nos. PRE 2020 1 0274 and PRE 2017 1 0357, respectively

    Translating Neurocognitive Models of Auditory-Verbal Hallucinations into Therapy: Using Real-time fMRI-Neurofeedback to Treat Voices

    Get PDF
    Auditory-verbal hallucinations (AVHs) are frequent and disabling symptoms, which can be refractory to conventional psychopharmacological treatment in more than 25% of the cases. Recent advances in brain imaging allow for a better understanding of the neural underpinnings of AVHs. These findings strengthened transdiagnostic neurocognitive models that characterize these frequent and disabling experiences. At the same time, technical improvements in real-time functional magnetic resonance imaging (fMRI) enabled the development of innovative and non-invasive methods with the potential to relieve psychiatric symptoms, such as fMRI-based neurofeedback (fMRI-NF). During fMRI-NF, brain activity is measured and fed back in real time to the participant in order to help subjects to progressively achieve voluntary control over their own neural activity. Precisely defining the target brain area/network(s) appears critical in fMRI-NF protocols. After reviewing the available neurocognitive models for AVHs, we elaborate on how recent findings in the field may help to develop strong a priori strategies for fMRI-NF target localization. The first approach relies on imaging-based “trait markers” (i.e., persistent traits or vulnerability markers that can also be detected in the presymptomatic and remitted phases of AVHs). The goal of such strategies is to target areas that show aberrant activations during AVHs or are known to be involved in compensatory activation (or resilience processes). Brain regions, from which the NF signal is derived, can be based on structural MRI and neurocognitive knowledge, or functional MRI information collected during specific cognitive tasks. Because hallucinations are acute and intrusive symptoms, a second strategy focuses more on “state markers.” In this case, the signal of interest relies on fMRI capture of the neural networks exhibiting increased activity during AVHs occurrences, by means of multivariate pattern recognition methods. The fine-grained activity patterns concomitant to hallucinations can then be fed back to the patients for therapeutic purpose. Considering the potential cost necessary to implement fMRI-NF, proof-of-concept studies are urgently required to define the optimal strategy for application in patients with AVHs. This technique has the potential to establish a new brain imaging-guided psychotherapy for patients that do not respond to conventional treatments and take functional neuroimaging to therapeutic applications

    A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

    Get PDF
    Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

    Technical Workshop: Advanced Helicopter Cockpit Design

    Get PDF
    Information processing demands on both civilian and military aircrews have increased enormously as rotorcraft have come to be used for adverse weather, day/night, and remote area missions. Applied psychology, engineering, or operational research for future helicopter cockpit design criteria were identified. Three areas were addressed: (1) operational requirements, (2) advanced avionics, and (3) man-system integration

    Secured vocal access to telephone servers

    Get PDF
    A number of applications of man-machine interaction over the telephone requires a combination of speech recognition and speaker verification. This paper describes current work carried out at IDIAP in the framework of national and European projects. A generic Interactive Voice Server (IVS) is described by means of a graphical formalism. It includes speech recognition based on speaker independent flexible vocabulary technology and speaker verification performed by a number of techniques executed in parallel, and combined for optimal decision
    • …
    corecore