58 research outputs found
A Comparison between Deep Neural Nets and Kernel Acoustic Models for Speech Recognition
We study large-scale kernel methods for acoustic modeling and compare to DNNs
on performance metrics related to both acoustic modeling and recognition.
Measuring perplexity and frame-level classification accuracy, kernel-based
acoustic models are as effective as their DNN counterparts. However, on
token-error-rates DNN models can be significantly better. We have discovered
that this might be attributed to DNN's unique strength in reducing both the
perplexity and the entropy of the predicted posterior probabilities. Motivated
by our findings, we propose a new technique, entropy regularized perplexity,
for model selection. This technique can noticeably improve the recognition
performance of both types of models, and reduces the gap between them. While
effective on Broadcast News, this technique could be also applicable to other
tasks.Comment: arXiv admin note: text overlap with arXiv:1411.400
The 2015 Sheffield System for Longitudinal Diarisation of Broadcast Media
Speaker diarisation is the task of answering "who spoke when" within a multi-speaker audio recording. Diarisation of broadcast media typically operates on individual television shows, and is a particularly difficult task, due to a high number of speakers and challenging background conditions. Using prior knowledge, such as that from previous shows in a series, can improve performance. Longitudinal diarisation allows to use knowledge from previous audio files to improve performance, but requires finding matching speakers across consecutive files. This paper describes the University of Sheffield system for participation in the 2015 Multi-Genre Broadcast (MGB) challenge. The challenge required longitudinal diarisation of data from BBC archives, under very constrained resource settings. Our system consists of three main stages: speech activity detection using DNNs with novel adaptation and decoding methods; speaker segmentation and clustering, with adaptation of the DNN-based clustering models; and finally speaker linking to match speakers across shows. The final result on the development set of 19 shows from five different television series provided a Diarisation Error Rate of 50.77% in the diarisation and linking task
Contextual Sequence Modeling for Recommendation with Recurrent Neural Networks
Recommendations can greatly benefit from good representations of the user
state at recommendation time. Recent approaches that leverage Recurrent Neural
Networks (RNNs) for session-based recommendations have shown that Deep Learning
models can provide useful user representations for recommendation. However,
current RNN modeling approaches summarize the user state by only taking into
account the sequence of items that the user has interacted with in the past,
without taking into account other essential types of context information such
as the associated types of user-item interactions, the time gaps between events
and the time of day for each interaction. To address this, we propose a new
class of Contextual Recurrent Neural Networks for Recommendation (CRNNs) that
can take into account the contextual information both in the input and output
layers and modifying the behavior of the RNN by combining the context embedding
with the item embedding and more explicitly, in the model dynamics, by
parametrizing the hidden unit transitions as a function of context information.
We compare our CRNNs approach with RNNs and non-sequential baselines and show
good improvements on the next event prediction task
A Dialogue-Act Taxonomy for a Virtual Coach Designed to Improve the Life of Elderly
This paper presents a dialogue act taxonomy designed for the development of a conversational agent for elderly. The main goal of this conversational agent is to improve life quality of the user by means of coaching sessions in different topics. In contrast to other approaches such as task-oriented dialogue systems and chit-chat implementations, the agent should display a pro-active attitude, driving the conversation to reach a number of diverse coaching goals. Therefore, the main characteristic of the introduced dialogue act taxonomy is its capacity for supporting a communication based on the GROW model for coaching. In addition, the taxonomy has a hierarchical structure between the tags and it is multimodal. We use the taxonomy to annotate a Spanish dialogue corpus collected from a group of elder people. We also present a preliminary examination of the annotated corpus and discuss on the multiple possibilities it presents for further research.The research presented in this paper is conducted as part of the project EMPATHIC that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 769872. The authors would also like to thank the support by the Basque Government through the project IT-1244-19
- …