1,095 research outputs found
Assessed Relevance and Stylistic Variation
Texts exhibit considerable stylistic variation. This paper reports an
experiment where a large corpus of documents is analyzed using various
simple stylistic metrics. A subset of the corpus has been previously
assessed to be relevant for answering given information retrieval
queries. The experiment shows that this subset differs significantly from
the rest of the corpus in terms of the stylistic metrics studied
Stylistic Variation in an Information Retrieval Experiment
Texts exhibit considerable stylistic variation. This paper reports an
experiment where a corpus of documents (N= 75 000) is analyzed using various
simple stylistic metrics. A subset (n = 1000) of the corpus has been previously
assessed to be relevant for answering given information retrieval queries. The
experiment shows that this subset differs significantly from the rest of the
corpus in terms of the stylistic metrics studied.Comment: Proceedings of NEMLAP-
Using term clouds to represent segment-level semantic content of podcasts
Spoken audio, like any time-continuous medium, is notoriously difficult to browse or skim without support of an interface providing semantically annotated jump points to signal the user where to listen in. Creation of time-aligned metadata by human annotators is prohibitively expensive, motivating the investigation of representations of segment-level semantic content based on transcripts
generated by automatic speech recognition (ASR). This paper
examines the feasibility of using term clouds to provide users with a structured representation of the semantic content of podcast episodes. Podcast episodes are visualized as a series of sub-episode segments, each represented by a term cloud derived from a transcript
generated by automatic speech recognition (ASR). Quality of
segment-level term clouds is measured quantitatively and their utility is investigated using a small-scale user study based on human labeled segment boundaries. Since the segment-level clouds generated from ASR-transcripts prove useful, we examine an adaptation of text tiling techniques to speech in order to be able to generate segments as part of a completely automated indexing and structuring system for browsing of spoken audio. Results demonstrate that the segments generated are comparable with human selected segment boundaries
Bridging the Gap Between Retrieval and Summarization
Information Retrieval is, at its core, a field focused on providing information to users to fulfill an information need. One of the most common use cases of Information Retrieval is document-level retrieval, which seeks to provide a collection of documents to the user that addresses their needs. In contrast to this, single document retrieval seeks to instead provide the user with a single document comprised of all required information. We seek to extend single document retrieval to single document generation, in which we use multiple source documents to create a new document which directly addresses the information need
Utilizing sub-topical structure of documents for information retrieval.
Text segmentation in natural language processing typically refers to the process of decomposing a document into constituent subtopics. Our work centers on the application of text segmentation techniques within information retrieval (IR) tasks. For example, for scoring a document by combining the retrieval scores of its constituent segments, exploiting the proximity of query terms in documents for ad-hoc search, and for question answering (QA), where retrieved passages from multiple documents are aggregated and presented as a single document to a searcher. Feedback in ad hoc IR task is shown to benefit from the use of extracted sentences instead of terms from the pseudo relevant documents for query expansion. Retrieval effectiveness for patent prior art search task is enhanced by applying text segmentation to the patent queries. Another aspect of our work involves augmenting text segmentation techniques to produce segments which are more readable with less unresolved anaphora. This is particularly useful for QA and snippet generation tasks where the objective is to aggregate relevant and novel information from multiple documents satisfying user information need on one hand, and ensuring that the automatically generated content presented to the user is easily readable without reference to the original source document
Retrieval through explanation : an abductive inference approach to relevance feedback
Relevance feedback techniques are designed to automatically improve a system's representation of a query by using documents the user has marked as relevant. However, traditional relevance feedback models suffer from a number of limitations that restrict their potential in supporting information seeking. One of the major limitations of relevance feedback is that it does not incorporate behavioural aspects of information seeking - how and why users assess relevance. We propose that relevance feedback should be viewed as a process of explanation and demonstrate how this limitation of relevance feedback techniques can be overcome by a theory of relevance feedback based on abductive inference
- …