3 research outputs found

    Modeling Users' Information Needs in a Document Recommender for Meetings

    Get PDF
    People are surrounded by an unprecedented wealth of information. Access to it depends on the availability of suitable search engines, but even when these are available, people often do not initiate a search, because their current activity does not allow them, or they are not aware of the existence of this information. Just-in-time retrieval brings a radical change to the process of query-based retrieval, by proactively retrieving documents relevant to users' current activities, in an easily accessible and non-intrusive manner. This thesis presents a novel set of methods intended to improve the relevance of a just-in-time retrieval system, specifically a document recommender system designed for conversations, in terms of precision and diversity of results. Additionally, we designed an evaluation protocol to compare the proposed methods in the thesis with other ones using crowdsourcing. In contrast to previous systems, which model users' information needs by extracting keywords from clean and well-structured texts, this system models them from the conversation transcripts, which contain noise from automatic speech recognition (ASR) and have a free structure, often switching between several topics. To deal with these issues, we first propose a novel keyword extraction method which preserves both the relevance and the diversity of topics of the conversation, to properly capture possible users' needs with minimum ASR noise. Implicit queries are then built from these keywords. However, the presence of multiple unrelated topics in one query introduces significant noise into the retrieval results. To reduce this effect, we separate users' needs by topically clustering keyword sets into several subsets or implicit queries. We introduce a merging method which combines the results of multiple queries which are prepared from users' conversation to generate a concise, diverse and relevant list of documents. This method ensures that the system does not distract its users from their current conversation by frequently recommending them a large number of documents. Moreover, we address the problem of explicit queries that may be asked by users during a conversation. We introduce a query refinement method which leverages the conversation context to answer the users' information needs without asking for additional clarifications and therefore, again, avoiding to distract users during their conversation. Finally, we implemented the end-to-end document recommender system by integrating the ideas proposed in this thesis and then proposed an evaluation scenario with human users in a brainstorming meeting

    ZERO RESOURCE SPOKEN AUDIO CORPUS ANALYSIS

    No full text
    Zero-resource speech processing involves the automatic analysis of a collection of speech data in a completely unsupervised fashion without the benefit of any transcriptions or annotations of the data. In this paper, our zero-resource system seeks to automatically discover important words, phrases and topical themes present in an audio corpus. This system employs a segmental dynamic time warping (S-DTW) algorithm for acoustic pattern discovery in conjunction with a probabilistic model which treats the topic and pseudo-word identity of each discovered pattern as hidden variables. By applying an Expectation-Maximization (EM) algorithm, our system estimates the latent probability distributions over the pseudo-words and topics associated with the discovered patterns. Using this information, we produce acoustic summaries of the dominant topical themes of the audio document collection. Index Terms — Zero-resource speech processing, spoken term discovery, speech summarization. 1.1. The Zero Resource Setting 1
    corecore