Search CORE

58 research outputs found

A Comparison between Deep Neural Nets and Kernel Acoustic Models for Speech Recognition

Author: Bellet Aurelien
Collins Michael
Fan Linxi
Garakani Alireza Bagheri
Guo Dong
Kingsbury Brian
Liu Kuan
Lu Zhiyun
May Avner
Picheny Michael
Sha Fei
Publication venue
Publication date: 18/03/2016
Field of study

We study large-scale kernel methods for acoustic modeling and compare to DNNs on performance metrics related to both acoustic modeling and recognition. Measuring perplexity and frame-level classification accuracy, kernel-based acoustic models are as effective as their DNN counterparts. However, on token-error-rates DNN models can be significantly better. We have discovered that this might be attributed to DNN's unique strength in reducing both the perplexity and the entropy of the predicted posterior probabilities. Motivated by our findings, we propose a new technique, entropy regularized perplexity, for model selection. This technique can noticeably improve the recognition performance of both types of models, and reduces the gap between them. While effective on Broadcast News, this technique could be also applicable to other tasks.Comment: arXiv admin note: text overlap with arXiv:1411.400

arXiv.org e-Print Archive

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

The 2015 Sheffield System for Longitudinal Diarisation of Broadcast Media

Author: Deena S.
Doulaty M.
Hain T.
Milner R.
Ng R.
Saz O.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2015
Field of study

Speaker diarisation is the task of answering "who spoke when" within a multi-speaker audio recording. Diarisation of broadcast media typically operates on individual television shows, and is a particularly difficult task, due to a high number of speakers and challenging background conditions. Using prior knowledge, such as that from previous shows in a series, can improve performance. Longitudinal diarisation allows to use knowledge from previous audio files to improve performance, but requires finding matching speakers across consecutive files. This paper describes the University of Sheffield system for participation in the 2015 Multi-Genre Broadcast (MGB) challenge. The challenge required longitudinal diarisation of data from BBC archives, under very constrained resource settings. Our system consists of three main stages: speech activity detection using DNNs with novel adaptation and decoding methods; speaker segmentation and clustering, with adaptation of the DNN-based clustering models; and finally speaker linking to match speakers across shows. The final result on the development set of 19 shows from five different television series provided a Diarisation Error Rate of 50.77% in the diarisation and linking task

White Rose Research Online

Contextual Sequence Modeling for Recommendation with Recurrent Neural Networks

Author: Ha David
Hidasi Balázs
Inan Hakan
Kiros Ryan
Sutskever Ilya
Wu Yuhuai
Publication venue
Publication date: 23/06/2017
Field of study

Recommendations can greatly benefit from good representations of the user state at recommendation time. Recent approaches that leverage Recurrent Neural Networks (RNNs) for session-based recommendations have shown that Deep Learning models can provide useful user representations for recommendation. However, current RNN modeling approaches summarize the user state by only taking into account the sequence of items that the user has interacted with in the past, without taking into account other essential types of context information such as the associated types of user-item interactions, the time gaps between events and the time of day for each interaction. To address this, we propose a new class of Contextual Recurrent Neural Networks for Recommendation (CRNNs) that can take into account the contextual information both in the input and output layers and modifying the behavior of the RNN by combining the context embedding with the item embedding and more explicitly, in the model dynamics, by parametrizing the hidden unit transitions as a function of context information. We compare our CRNNs approach with RNNs and non-sequential baselines and show good improvements on the next event prediction task

arXiv.org e-Print Archive

Crossref

A Parallel Recurrent Neural Network for Language Modeling with POS Tags

Author: Guo Yuhang
Huang Heyan
Shi Shumin
Su Chao
Wu Hao
Publication venue: the National University (Philippines)
Publication date: 01/01/2017
Field of study

Waseda University Repository

A Dialogue-Act Taxonomy for a Virtual Coach Designed to Improve the Life of Elderly

Author: Austin
Bunt
Graham
Grant
Keizer
Lowe
López Zorrilla
Passmore
Passmore
Popescu-Belis
Sayas
Sayas
Sayas
Serban
Tur
Vukotic
Whitemore
Zhang
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

This paper presents a dialogue act taxonomy designed for the development of a conversational agent for elderly. The main goal of this conversational agent is to improve life quality of the user by means of coaching sessions in different topics. In contrast to other approaches such as task-oriented dialogue systems and chit-chat implementations, the agent should display a pro-active attitude, driving the conversation to reach a number of diverse coaching goals. Therefore, the main characteristic of the introduced dialogue act taxonomy is its capacity for supporting a communication based on the GROW model for coaching. In addition, the taxonomy has a hierarchical structure between the tags and it is multimodal. We use the taxonomy to annotate a Spanish dialogue corpus collected from a group of elder people. We also present a preliminary examination of the annotated corpus and discuss on the multiple possibilities it presents for further research.The research presented in this paper is conducted as part of the project EMPATHIC that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 769872. The authors would also like to thank the support by the Basque Government through the project IT-1244-19

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Archivo Digital para la Docencia y la Investigación