58 research outputs found

    A Comparison between Deep Neural Nets and Kernel Acoustic Models for Speech Recognition

    Get PDF
    We study large-scale kernel methods for acoustic modeling and compare to DNNs on performance metrics related to both acoustic modeling and recognition. Measuring perplexity and frame-level classification accuracy, kernel-based acoustic models are as effective as their DNN counterparts. However, on token-error-rates DNN models can be significantly better. We have discovered that this might be attributed to DNN's unique strength in reducing both the perplexity and the entropy of the predicted posterior probabilities. Motivated by our findings, we propose a new technique, entropy regularized perplexity, for model selection. This technique can noticeably improve the recognition performance of both types of models, and reduces the gap between them. While effective on Broadcast News, this technique could be also applicable to other tasks.Comment: arXiv admin note: text overlap with arXiv:1411.400

    The 2015 Sheffield System for Longitudinal Diarisation of Broadcast Media

    Get PDF
    Speaker diarisation is the task of answering "who spoke when" within a multi-speaker audio recording. Diarisation of broadcast media typically operates on individual television shows, and is a particularly difficult task, due to a high number of speakers and challenging background conditions. Using prior knowledge, such as that from previous shows in a series, can improve performance. Longitudinal diarisation allows to use knowledge from previous audio files to improve performance, but requires finding matching speakers across consecutive files. This paper describes the University of Sheffield system for participation in the 2015 Multi-Genre Broadcast (MGB) challenge. The challenge required longitudinal diarisation of data from BBC archives, under very constrained resource settings. Our system consists of three main stages: speech activity detection using DNNs with novel adaptation and decoding methods; speaker segmentation and clustering, with adaptation of the DNN-based clustering models; and finally speaker linking to match speakers across shows. The final result on the development set of 19 shows from five different television series provided a Diarisation Error Rate of 50.77% in the diarisation and linking task

    Contextual Sequence Modeling for Recommendation with Recurrent Neural Networks

    Full text link
    Recommendations can greatly benefit from good representations of the user state at recommendation time. Recent approaches that leverage Recurrent Neural Networks (RNNs) for session-based recommendations have shown that Deep Learning models can provide useful user representations for recommendation. However, current RNN modeling approaches summarize the user state by only taking into account the sequence of items that the user has interacted with in the past, without taking into account other essential types of context information such as the associated types of user-item interactions, the time gaps between events and the time of day for each interaction. To address this, we propose a new class of Contextual Recurrent Neural Networks for Recommendation (CRNNs) that can take into account the contextual information both in the input and output layers and modifying the behavior of the RNN by combining the context embedding with the item embedding and more explicitly, in the model dynamics, by parametrizing the hidden unit transitions as a function of context information. We compare our CRNNs approach with RNNs and non-sequential baselines and show good improvements on the next event prediction task

    A Parallel Recurrent Neural Network for Language Modeling with POS Tags

    Get PDF

    A Dialogue-Act Taxonomy for a Virtual Coach Designed to Improve the Life of Elderly

    Get PDF
    This paper presents a dialogue act taxonomy designed for the development of a conversational agent for elderly. The main goal of this conversational agent is to improve life quality of the user by means of coaching sessions in different topics. In contrast to other approaches such as task-oriented dialogue systems and chit-chat implementations, the agent should display a pro-active attitude, driving the conversation to reach a number of diverse coaching goals. Therefore, the main characteristic of the introduced dialogue act taxonomy is its capacity for supporting a communication based on the GROW model for coaching. In addition, the taxonomy has a hierarchical structure between the tags and it is multimodal. We use the taxonomy to annotate a Spanish dialogue corpus collected from a group of elder people. We also present a preliminary examination of the annotated corpus and discuss on the multiple possibilities it presents for further research.The research presented in this paper is conducted as part of the project EMPATHIC that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 769872. The authors would also like to thank the support by the Basque Government through the project IT-1244-19
    corecore