3,518 research outputs found
Language modeling and transcription of the TED corpus lectures
Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work, we present our first results on the automatic transcription of lectures from the TED corpus, recently released by ELRA and LDC. In particular, we concentrated our effort on language modeling. Baseline acoustic and language models were developed using respectively 8 hours of TED transcripts and various types of texts: conference proceedings, lecture transcripts, and conversational speech transcripts. Then, adaptation of the language model to single speakers was investigated by exploiting different kinds of information: automatic transcripts of the talk, the title of the talk, the abstract and, finally, the paper. In the last case, a 39.2% WER was achieved
Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems
This work investigates the embeddings for representing dialog history in
spoken language understanding (SLU) systems. We focus on the scenario when the
semantic information is extracted directly from the speech signal by means of a
single end-to-end neural network model. We proposed to integrate dialogue
history into an end-to-end signal-to-concept SLU system. The dialog history is
represented in the form of dialog history embedding vectors (so-called
h-vectors) and is provided as an additional information to end-to-end SLU
models in order to improve the system performance. Three following types of
h-vectors are proposed and experimentally evaluated in this paper: (1)
supervised-all embeddings predicting bag-of-concepts expected in the answer of
the user from the last dialog system response; (2) supervised-freq embeddings
focusing on predicting only a selected set of semantic concept (corresponding
to the most frequent errors in our experiments); and (3) unsupervised
embeddings. Experiments on the MEDIA corpus for the semantic slot filling task
demonstrate that the proposed h-vectors improve the model performance.Comment: Accepted for ICASSP 2020 (Submitted: October 21, 2019
A Contextual Study of Semantic Speech Editing in Radio Production
Radio production involves editing speech-based audio using tools
that represent sound using simple waveforms. Semantic speech editing systems allow users to edit audio using an automatically generated
transcript, which has the potential to improve the production workflow. To investigate this, we developed a semantic audio editor based
on a pilot study. Through a contextual qualitative study of five professional radio producers at the BBC, we examined the existing radio
production process and evaluated our semantic editor by using it to
create programmes that were later broadcast.
We observed that the participants in our study wrote detailed notes
about their recordings and used annotation to mark which parts they
wanted to use. They collaborated closely with the presenter of their
programme to structure the contents and write narrative elements.
Participants reported that they often work away from the office to
avoid distractions, and print transcripts so they can work away from
screens. They also emphasised that listening is an important part
of production, to ensure high sound quality. We found that semantic speech editing with automated speech recognition can be used to improve the radio production workflow, but that annotation, collaboration, portability and listening were not well supported by current
semantic speech editing systems. In this paper, we make recommendations on how future semantic speech editing systems can better
support the requirements of radio production
Leveraging study of robustness and portability of spoken language understanding systems across languages and domains: the PORTMEDIA corpora
International audienceThe PORTMEDIA project is intended to develop new corpora for the evaluation of spoken language understanding systems. The newly collected data are in the field of human-machine dialogue systems for tourist information in French in line with the MEDIA corpus. Transcriptions and semantic annotations, obtained by low-cost procedures, are provided to allow a thorough evaluation of the systems' capabilities in terms of robustness and portability across languages and domains. A new test set with some adaptation data is prepared for each case: in Italian as an example of a new language, for ticket reservation as an example of a new domain. Finally the work is complemented by the proposition of a new high level semantic annotation scheme well-suited to dialogue data
- …