375 research outputs found
Zero-Shot Learning for Semantic Utterance Classification
We propose a novel zero-shot learning method for semantic utterance
classification (SUC). It learns a classifier for problems where
none of the semantic categories are present in the training set. The
framework uncovers the link between categories and utterances using a semantic
space. We show that this semantic space can be learned by deep neural networks
trained on large amounts of search engine query log data. More precisely, we
propose a novel method that can learn discriminative semantic features without
supervision. It uses the zero-shot learning framework to guide the learning of
the semantic features. We demonstrate the effectiveness of the zero-shot
semantic learning algorithm on the SUC dataset collected by (Tur, 2012).
Furthermore, we achieve state-of-the-art results by combining the semantic
features with a supervised method
Recommended from our members
Real-time decoding of question-and-answer speech dialogue using human cortical activity.
Natural communication often occurs in dialogue, differentially engaging auditory and sensorimotor brain regions during listening and speaking. However, previous attempts to decode speech directly from the human brain typically consider listening or speaking tasks in isolation. Here, human participants listened to questions and responded aloud with answers while we used high-density electrocorticography (ECoG) recordings to detect when they heard or said an utterance and to then decode the utterance's identity. Because certain answers were only plausible responses to certain questions, we could dynamically update the prior probabilities of each answer using the decoded question likelihoods as context. We decode produced and perceived utterances with accuracy rates as high as 61% and 76%, respectively (chance is 7% and 20%). Contextual integration of decoded question likelihoods significantly improves answer decoding. These results demonstrate real-time decoding of speech in an interactive, conversational setting, which has important implications for patients who are unable to communicate
A Dialogue-Act Taxonomy for a Virtual Coach Designed to Improve the Life of Elderly
This paper presents a dialogue act taxonomy designed for the development of a conversational agent for elderly. The main goal of this conversational agent is to improve life quality of the user by means of coaching sessions in different topics. In contrast to other approaches such as task-oriented dialogue systems and chit-chat implementations, the agent should display a pro-active attitude, driving the conversation to reach a number of diverse coaching goals. Therefore, the main characteristic of the introduced dialogue act taxonomy is its capacity for supporting a communication based on the GROW model for coaching. In addition, the taxonomy has a hierarchical structure between the tags and it is multimodal. We use the taxonomy to annotate a Spanish dialogue corpus collected from a group of elder people. We also present a preliminary examination of the annotated corpus and discuss on the multiple possibilities it presents for further research.The research presented in this paper is conducted as part of the project EMPATHIC that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 769872. The authors would also like to thank the support by the Basque Government through the project IT-1244-19
Out-of-vocabulary spoken term detection
Spoken term detection (STD) is a fundamental task for multimedia information
retrieval. A major challenge faced by an STD system is the serious performance reduction
when detecting out-of-vocabulary (OOV) terms. The difficulties arise not only
from the absence of pronunciations for such terms in the system dictionaries, but from
intrinsic uncertainty in pronunciations, significant diversity in term properties and a
high degree of weakness in acoustic and language modelling.
To tackle the OOV issue, we first applied the joint-multigram model to predict pronunciations
for OOV terms in a stochastic way. Based on this, we propose a stochastic
pronunciation model that considers all possible pronunciations for OOV terms so that
the high pronunciation uncertainty is compensated for.
Furthermore, to deal with the diversity in term properties, we propose a termdependent
discriminative decision strategy, which employs discriminative models to
integrate multiple informative factors and confidence measures into a classification
probability, which gives rise to minimum decision cost.
In addition, to address the weakness in acoustic and language modelling, we propose
a direct posterior confidence measure which replaces the generative models with
a discriminative model, such as a multi-layer perceptron (MLP), to obtain a robust
confidence for OOV term detection.
With these novel techniques, the STD performance on OOV terms was improved
substantially and significantly in our experiments set on meeting speech data
Characterizing Spoken Discourse in Individuals with Parkinson Disease Without Dementia
Background: The effects of disease (PD) on cognition, word retrieval, syntax, and speech/voice processes may interact to manifest uniquely in spoken language tasks. A handful of studies have explored spoken discourse production in PD and, while not ubiquitously, have reported a number of impairments including: reduced words per minute, reduced grammatical complexity, reduced informativeness, and increased verbal disruption. Methodological differences have impeded cross-study comparisons. As such, the profile of spoken language impairments in PD remains ambiguous.
Method: A cross-genre, multi-level discourse analysis, prospective, cross-sectional between groups study design was conducted with 19 PD participants (Mage = 70.74, MUPDRS-III = 30.26) and 19 healthy controls (Mage = 68.16) without dementia. The extensive protocol included a battery of cognitive, language, and speech measures in addition to four discourse tasks. Two tasks each from two discourse genres (picture sequence description; story retelling) were collected. Discourse samples were analysed using both microlinguistic and macrostructural measures. Discourse variables were collapsed statistically to a primal set of variables used to distinguish the spoken discourse of PD vs. controls.
Results: Participants with PD differed significantly from controls along a continuum of productivity, grammar, informativeness, and verbal disruption domains including total words F(1,36) = 3.87, p = .06; words/minute F(1,36) = 7.74, p = .01 , % grammatical utterances F(1,36) = 11.92, p = .001, total CIUs F(1,36) = 13.30, p = .001, % CIUs (Correct Information Units) F(1,36) = 9.35, p = .004, CIUs/minute F(1,36) = 14.06, p = .001, and verbal disruptions/100 words F(1,36) = 3.87, p = .06 (α = .10). Discriminant function analyses showed that optimally weighted discourse variables discriminated the spoken discourse of PD vs. controls with 81.6% sensitivity and 86.8% specificity. For both discourse genres, discourse performance showed robust, positive, correlations with global cognition. In PD (picture sequence description), more impaired discourse performance correlated significantly with more severe motor impairment, more advanced disease staging, and higher doses of PD medications.
Conclusions: The spoken discourse in PD without dementia differs significantly and predictably from controls. Results have both research and clinical implications
Detecting emotions from speech using machine learning techniques
D.Phil. (Electronic Engineering
- …