29 research outputs found
Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems
This work investigates the embeddings for representing dialog history in
spoken language understanding (SLU) systems. We focus on the scenario when the
semantic information is extracted directly from the speech signal by means of a
single end-to-end neural network model. We proposed to integrate dialogue
history into an end-to-end signal-to-concept SLU system. The dialog history is
represented in the form of dialog history embedding vectors (so-called
h-vectors) and is provided as an additional information to end-to-end SLU
models in order to improve the system performance. Three following types of
h-vectors are proposed and experimentally evaluated in this paper: (1)
supervised-all embeddings predicting bag-of-concepts expected in the answer of
the user from the last dialog system response; (2) supervised-freq embeddings
focusing on predicting only a selected set of semantic concept (corresponding
to the most frequent errors in our experiments); and (3) unsupervised
embeddings. Experiments on the MEDIA corpus for the semantic slot filling task
demonstrate that the proposed h-vectors improve the model performance.Comment: Accepted for ICASSP 2020 (Submitted: October 21, 2019
A Deep-Parsing Approach to Natural Language Understanding in Dialogue System: Results of a Corpus-Based Evaluation
International audienceThis paper presents an approach to dialogue understanding based on a deep parsing and rule-based semantic analysis. Its performance in the semantic evaluation performed in the framework of the EVALDA/MEDIA campaign is encouraging. The MEDIA project aims to evaluate natural language understanding systems for French on a hotel reservation task (Devillers et al., 2004). For the evaluation, five participating teams had to produce an annotated version of the input utterances in compliance with a commonly agreed format (the MEDIA formalism). An approach based on symbolic processing was not straightforward given the conditions of the evaluation but we achieved a score close to that of statistical systems, without needing an annotated corpus. Despite the architecture has been designed for this campaign, exclusively dedicated to spoken dialogue understanding, we believe that our approach based on a LTAG parser and two ontologies can be used in real dialogue systems, providing quite robust speech understanding and facilities for interfacing with a dialogue manager and the application itself
Effective Spoken Language Labeling with Deep Recurrent Neural Networks
Understanding spoken language is a highly complex problem, which can be
decomposed into several simpler tasks. In this paper, we focus on Spoken
Language Understanding (SLU), the module of spoken dialog systems responsible
for extracting a semantic interpretation from the user utterance. The task is
treated as a labeling problem. In the past, SLU has been performed with a wide
variety of probabilistic models. The rise of neural networks, in the last
couple of years, has opened new interesting research directions in this domain.
Recurrent Neural Networks (RNNs) in particular are able not only to represent
several pieces of information as embeddings but also, thanks to their recurrent
architecture, to encode as embeddings relatively long contexts. Such long
contexts are in general out of reach for models previously used for SLU. In
this paper we propose novel RNNs architectures for SLU which outperform
previous ones. Starting from a published idea as base block, we design new deep
RNNs achieving state-of-the-art results on two widely used corpora for SLU:
ATIS (Air Traveling Information System), in English, and MEDIA (Hotel
information and reservation in France), in French.Comment: 8 pages. Rejected from IJCAI 2017, good remarks overall, but slightly
off-topic as from global meta-reviews. Recommendations: 8, 6, 6, 4. arXiv
admin note: text overlap with arXiv:1706.0174
Benchmarking Transformers-based models on French Spoken Language Understanding tasks
In the last five years, the rise of the self-attentional Transformer-based
architectures led to state-of-the-art performances over many natural language
tasks. Although these approaches are increasingly popular, they require large
amounts of data and computational resources. There is still a substantial need
for benchmarking methodologies ever upwards on under-resourced languages in
data-scarce application conditions. Most pre-trained language models were
massively studied using the English language and only a few of them were
evaluated on French. In this paper, we propose a unified benchmark, focused on
evaluating models quality and their ecological impact on two well-known French
spoken language understanding tasks. Especially we benchmark thirteen
well-established Transformer-based models on the two available spoken language
understanding tasks for French: MEDIA and ATIS-FR. Within this framework, we
show that compact models can reach comparable results to bigger ones while
their ecological impact is considerably lower. However, this assumption is
nuanced and depends on the considered compression method.Comment: Accepted paper at INTERSPEECH 202
LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
Self-Supervised Learning (SSL) using huge unlabeled data has been
successfully explored for image and natural language processing. Recent works
also investigated SSL from speech. They were notably successful to improve
performance on downstream tasks such as automatic speech recognition (ASR).
While these works suggest it is possible to reduce dependence on labeled data
for building efficient speech systems, their evaluation was mostly made on ASR
and using multiple and heterogeneous experimental settings (most of them for
English). This questions the objective comparison of SSL approaches and the
evaluation of their impact on building speech systems. In this paper, we
propose LeBenchmark: a reproducible framework for assessing SSL from speech. It
not only includes ASR (high and low resource) tasks but also spoken language
understanding, speech translation and emotion recognition. We also focus on
speech technologies in a language different than English: French. SSL models of
different sizes are trained from carefully sourced and documented datasets.
Experiments show that SSL is beneficial for most but not all tasks which
confirms the need for exhaustive and reliable benchmarks to evaluate its real
impact. LeBenchmark is shared with the scientific community for reproducible
research in SSL from speech.Comment: Will be presented at Interspeech 202
Label-Dependencies Aware Recurrent Neural Networks
In the last few years, Recurrent Neural Networks (RNNs) have proved effective
on several NLP tasks. Despite such great success, their ability to model
\emph{sequence labeling} is still limited. This lead research toward solutions
where RNNs are combined with models which already proved effective in this
domain, such as CRFs. In this work we propose a solution far simpler but very
effective: an evolution of the simple Jordan RNN, where labels are re-injected
as input into the network, and converted into embeddings, in the same way as
words. We compare this RNN variant to all the other RNN models, Elman and
Jordan RNN, LSTM and GRU, on two well-known tasks of Spoken Language
Understanding (SLU). Thanks to label embeddings and their combination at the
hidden layer, the proposed variant, which uses more parameters than Elman and
Jordan RNNs, but far fewer than LSTM and GRU, is more effective than other
RNNs, but also outperforms sophisticated CRF models.Comment: 22 pages, 3 figures. Accepted at CICling 2017 conference. Best
Verifiability, Reproducibility, and Working Description awar