Search CORE

2,608 research outputs found

A prosody-based vector-space model of dialog activity for information retrieval

Author: Barraja-Rohan
Bruni
Bunt
Chelba
Emilio Sanchis
Erk
Eskevich
Fernando Garcia
Galuscakova
Galuščáková
Larson
Lukowicz
Metze
Nigel G. Ward
Pallotta
Slaney
Steven D. Werner
Toivanen
Turian
Whittaker
Wollmer
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Search in audio archives is a challenging problem. Using prosodic information to help find relevant content has been proposed as a complement to word-based retrieval, but its utility has been an open question. We propose a new way to use prosodic information in search, based on a vector-space model, where each point in time maps to a point in a vector space whose dimensions are derived from numerous prosodic features of the local context. Point pairs that are close in this vector space are frequently similar, not only in terms of the dialog activities, but also in topic. Using proximity in this space as an indicator of similarity, we built support for a query-by-example function. Searchers were happy to use this function, and it provided value on a large testset. Prosody-based retrieval did not perform as well as word-based retrieval, but the two sources of information were often non-redundant and in combination they sometimes performed better than either separately.We thank Martha Larson, Alejandro Vega, Steve Renals, Khiet Truong, Olac Fuentes, David Novick, Shreyas Karkhedkar, Luis F. Ramirez, Elizabeth E. Shriberg, Catharine Oertel, Louis-Philippe Morency, Tatsuya Kawahara, Mary Harper, and the anonymous reviewers. This work was supported in part by the National Science Foundation under Grants IIS-0914868 and IIS-1241434 and by the Spanish MEC under contract TIN2011-28169-C05-01.Ward, NG.; Werner, SD.; García-Granada, F.; Sanchís Arnal, E. (2015). A prosody-based vector-space model of dialog activity for information retrieval. Speech Communication. 68:85-96. doi:10.1016/j.specom.2015.01.004S85966

CiteSeerX

Crossref

RiuNet

An End-to-End Conversational Style Matching Agent

Author: Bartneck Christoph
Bickmore Timothy
DeVault David
Elofson Greg
Gratch Jonathan
Hirschberg Julia
Pecune Florian
S
Tannen Deborah
Thomas Paul
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/08/2019
Field of study

We present an end-to-end voice-based conversational agent that is able to engage in naturalistic multi-turn dialogue and align with the interlocutor's conversational style. The system uses a series of deep neural network components for speech recognition, dialogue generation, prosodic analysis and speech synthesis to generate language and prosodic expression with qualities that match those of the user. We conducted a user study (N=30) in which participants talked with the agent for 15 to 20 minutes, resulting in over 8 hours of natural interaction data. Users with high consideration conversational styles reported the agent to be more trustworthy when it matched their conversational style. Whereas, users with high involvement conversational styles were indifferent. Finally, we provide design guidelines for multi-turn dialogue interactions using conversational style adaptation

arXiv.org e-Print Archive

Crossref

DCU at the NTCIR-11 SpokenQuery&Doc task

Author: Jones Gareth J.F.
Racca David
Publication venue
Publication date: 12/12/2014
Field of study

We describe DCU's participation in the NTCIR-11 Spoken-Query&Document task. We participated in the spoken query spoken content retrieval (SQ-SCR) subtask by using the slide group segments as basic indexing and retrieval units. Our approach integrates normalised prosodic features into a standard BM25 weighting function to increase weights for terms that are prominent in speech. Text queries and relevance assessment data from the NTCIR-10 SpokenDoc-2 passage retrieval task were used to train the prosodic-based models. Evaluation results indicate that our prosodic-based retrieval models do not provide significant improvements over a text-based BM25 model, but suggest that they can be useful for certain queries

Irish Universities

DCU Online Research Access Service

A prosody-based vectorspace model of dialog activity for information retrieval

Author: Emilio Sanchis
Fernando Garcia
Nigel G Ward
Steven D Werner
Publication venue
Publication date: 01/01/2015
Field of study

Abstract Search in audio archives is a challenging problem. Using prosodic information to help find relevant content has been proposed as a complement to word-based retrieval, but its utility has been an open question. We propose a new way to use prosodic information in search, based on a vector-space model, where each point in time maps to a point in a vector space whose dimensions are derived from numerous prosodic features of the local context. Point pairs that are close in this vector space are frequently similar, not only in terms of the dialog activities, but also in topic. Using proximity in this space as an indicator of similarity, we built support for a query-by-example function. Searchers were happy to use this function, and it provided value on a large testset. Prosody-based retrieval did not perform as well as word-based retrieval, but the two sources of information were often non-redundant and in combination they sometimes performed better than either separately

CiteSeerX

Suprasegmental speech perception, working memory and reading comprehension in Cantonese-English bilingual children

Author: Choi Tsun-man, William
蔡浚文
Publication venue: The University of Hong Kong (Pokfulam, Hong Kong)
Publication date: 01/01/2014
Field of study

This study set out to examine (a) lexical tone and stress perception by bilingual and monolingual children, (b) interrelationships between lexical pitches perception, general acoustic mechanism and working memory, and (c) the association between lexical tone awareness and Chinese text reading comprehension. Experiment 1 tested and compared the perception of Cantonese lexical tones, English lexical stress and nonlinguistic pitch between Cantonese-English bilingual and English monolingual children. The relationships between linguistic pitch perception, non-linguistic pitch perception and working memory were also examined among Cantonese-English bilingual children. Experiment 2 explored the relationship between Cantonese tone awareness and Chinese text reading comprehension skills. Results of this study illustrate differential performances in tone perception but similar performances in stress perception between bilinguals and monolinguals. In addition, inter-correlations were found between linguistic pitches perception, general acoustic mechanism, working memory and reading comprehension. These findings provide new insight to native and non-native perception of lexical pitches, and demonstrate an important link that exists between lexical tone awareness and reading comprehension.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

HKU Scholars Hub