37 research outputs found

    Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding

    Full text link
    Spoken dialogue systems (SDS) typically require a predefined semantic ontology to train a spoken language understanding (SLU) module. In addition to the anno-tation cost, a key challenge for design-ing such an ontology is to define a coher-ent slot set while considering their com-plex relations. This paper introduces a novel matrix factorization (MF) approach to learn latent feature vectors for utter-ances and semantic elements without the need of corpus annotations. Specifically, our model learns the semantic slots for a domain-specific SDS in an unsupervised fashion, and carries out semantic pars-ing using latent MF techniques. To fur-ther consider the global semantic struc-ture, such as inter-word and inter-slot re-lations, we augment the latent MF-based model with a knowledge graph propaga-tion model based on a slot-based seman-tic graph and a word-based lexical graph. Our experiments show that the proposed MF approaches produce better SLU mod-els that are able to predict semantic slots and word patterns taking into account their relations and domain-specificity in a joint manner.

    N-best speech hypotheses reordering using linear regression

    No full text
    We propose a hypothesis reordering technique to improve speech recognition accuracy in a dialog system. For such systems, additional information external to the decoding process itself is available, in particular features derived from the parse and the dialog. Such features can be combined with recognizer features by means of a linear regression model to predict the most likely entry in the hypothesis list. We introduce the use of concept error rate as an alternative accuracy measurement and compare it withy the use of word error rate. The proposed model performs better than human subjects performing the same hypothesis reordering task. 1

    A Comparison of Speech vs Typed Input

    No full text
    We conducted a series of empirical experiments in which users were asked to enter digit strings into the computer by voice or keyboard. Two different ways of verifying and correcting the spoken input were examined. Extensive timing analyses were performed to determine which aspects of the interface were critical to speedy completion of the task. The results show that speech is preferable for strings that require more than a few keystrokes. The results emphasize the need for fast and accurate speech recognition, but also demonstrate how error correction and input validation are crucial for an effective speech interface. 1
    corecore