SLU FOR VOICE COMMAND IN SMART HOME: COMPARISON OF PIPELINE AND END-TO-END APPROACHES

Abstract

International audienceSpoken Language Understanding (SLU) is typically performedthrough automatic speech recognition (ASR) andnatural language understanding (NLU) in a pipeline. However,errors at the ASR stage have a negative impact on theNLU performance. Hence, there is a rising interest in End-to-End (E2E) SLU to jointly perform ASR and NLU. AlthoughE2E models have shown superior performance to modularapproaches in many NLP tasks, current SLU E2E modelshave still not definitely superseded pipeline approaches.In this paper, we present a comparison of the pipelineand E2E approaches for the task of voice command in smarthomes. Since there are no large non-English domain-specificdata sets available, although needed for an E2E model, wetackle the lack of such data by combining Natural LanguageGeneration (NLG) and text-to-speech (TTS) to generateFrench training data. The trained models were evaluatedon voice commands acquired in a real smart home with severalspeakers. Results show that the E2E approach can reachperformances similar to a state-of-the art pipeline SLU despitea higher WER than the pipeline approach. Furthermore,the E2E model can benefit from artificially generated data toexhibit lower Concept Error Rates than the pipeline baselinefor slot recognition

    Similar works