2,608 research outputs found

    A prosody-based vector-space model of dialog activity for information retrieval

    Get PDF
    Search in audio archives is a challenging problem. Using prosodic information to help find relevant content has been proposed as a complement to word-based retrieval, but its utility has been an open question. We propose a new way to use prosodic information in search, based on a vector-space model, where each point in time maps to a point in a vector space whose dimensions are derived from numerous prosodic features of the local context. Point pairs that are close in this vector space are frequently similar, not only in terms of the dialog activities, but also in topic. Using proximity in this space as an indicator of similarity, we built support for a query-by-example function. Searchers were happy to use this function, and it provided value on a large testset. Prosody-based retrieval did not perform as well as word-based retrieval, but the two sources of information were often non-redundant and in combination they sometimes performed better than either separately.We thank Martha Larson, Alejandro Vega, Steve Renals, Khiet Truong, Olac Fuentes, David Novick, Shreyas Karkhedkar, Luis F. Ramirez, Elizabeth E. Shriberg, Catharine Oertel, Louis-Philippe Morency, Tatsuya Kawahara, Mary Harper, and the anonymous reviewers. This work was supported in part by the National Science Foundation under Grants IIS-0914868 and IIS-1241434 and by the Spanish MEC under contract TIN2011-28169-C05-01.Ward, NG.; Werner, SD.; García-Granada, F.; Sanchís Arnal, E. (2015). A prosody-based vector-space model of dialog activity for information retrieval. Speech Communication. 68:85-96. doi:10.1016/j.specom.2015.01.004S85966

    An End-to-End Conversational Style Matching Agent

    Full text link
    We present an end-to-end voice-based conversational agent that is able to engage in naturalistic multi-turn dialogue and align with the interlocutor's conversational style. The system uses a series of deep neural network components for speech recognition, dialogue generation, prosodic analysis and speech synthesis to generate language and prosodic expression with qualities that match those of the user. We conducted a user study (N=30) in which participants talked with the agent for 15 to 20 minutes, resulting in over 8 hours of natural interaction data. Users with high consideration conversational styles reported the agent to be more trustworthy when it matched their conversational style. Whereas, users with high involvement conversational styles were indifferent. Finally, we provide design guidelines for multi-turn dialogue interactions using conversational style adaptation

    DCU at the NTCIR-11 SpokenQuery&Doc task

    Get PDF
    We describe DCU's participation in the NTCIR-11 Spoken-Query&Document task. We participated in the spoken query spoken content retrieval (SQ-SCR) subtask by using the slide group segments as basic indexing and retrieval units. Our approach integrates normalised prosodic features into a standard BM25 weighting function to increase weights for terms that are prominent in speech. Text queries and relevance assessment data from the NTCIR-10 SpokenDoc-2 passage retrieval task were used to train the prosodic-based models. Evaluation results indicate that our prosodic-based retrieval models do not provide significant improvements over a text-based BM25 model, but suggest that they can be useful for certain queries

    A prosody-based vectorspace model of dialog activity for information retrieval

    Get PDF
    Abstract Search in audio archives is a challenging problem. Using prosodic information to help find relevant content has been proposed as a complement to word-based retrieval, but its utility has been an open question. We propose a new way to use prosodic information in search, based on a vector-space model, where each point in time maps to a point in a vector space whose dimensions are derived from numerous prosodic features of the local context. Point pairs that are close in this vector space are frequently similar, not only in terms of the dialog activities, but also in topic. Using proximity in this space as an indicator of similarity, we built support for a query-by-example function. Searchers were happy to use this function, and it provided value on a large testset. Prosody-based retrieval did not perform as well as word-based retrieval, but the two sources of information were often non-redundant and in combination they sometimes performed better than either separately

    Suprasegmental speech perception, working memory and reading comprehension in Cantonese-English bilingual children

    Get PDF
    This study set out to examine (a) lexical tone and stress perception by bilingual and monolingual children, (b) interrelationships between lexical pitches perception, general acoustic mechanism and working memory, and (c) the association between lexical tone awareness and Chinese text reading comprehension. Experiment 1 tested and compared the perception of Cantonese lexical tones, English lexical stress and nonlinguistic pitch between Cantonese-English bilingual and English monolingual children. The relationships between linguistic pitch perception, non-linguistic pitch perception and working memory were also examined among Cantonese-English bilingual children. Experiment 2 explored the relationship between Cantonese tone awareness and Chinese text reading comprehension skills. Results of this study illustrate differential performances in tone perception but similar performances in stress perception between bilinguals and monolinguals. In addition, inter-correlations were found between linguistic pitches perception, general acoustic mechanism, working memory and reading comprehension. These findings provide new insight to native and non-native perception of lexical pitches, and demonstrate an important link that exists between lexical tone awareness and reading comprehension.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science
    corecore