1,095 research outputs found

    Assessed Relevance and Stylistic Variation

    Get PDF
    Texts exhibit considerable stylistic variation. This paper reports an experiment where a large corpus of documents is analyzed using various simple stylistic metrics. A subset of the corpus has been previously assessed to be relevant for answering given information retrieval queries. The experiment shows that this subset differs significantly from the rest of the corpus in terms of the stylistic metrics studied

    Stylistic Variation in an Information Retrieval Experiment

    Full text link
    Texts exhibit considerable stylistic variation. This paper reports an experiment where a corpus of documents (N= 75 000) is analyzed using various simple stylistic metrics. A subset (n = 1000) of the corpus has been previously assessed to be relevant for answering given information retrieval queries. The experiment shows that this subset differs significantly from the rest of the corpus in terms of the stylistic metrics studied.Comment: Proceedings of NEMLAP-

    Using term clouds to represent segment-level semantic content of podcasts

    Get PDF
    Spoken audio, like any time-continuous medium, is notoriously difficult to browse or skim without support of an interface providing semantically annotated jump points to signal the user where to listen in. Creation of time-aligned metadata by human annotators is prohibitively expensive, motivating the investigation of representations of segment-level semantic content based on transcripts generated by automatic speech recognition (ASR). This paper examines the feasibility of using term clouds to provide users with a structured representation of the semantic content of podcast episodes. Podcast episodes are visualized as a series of sub-episode segments, each represented by a term cloud derived from a transcript generated by automatic speech recognition (ASR). Quality of segment-level term clouds is measured quantitatively and their utility is investigated using a small-scale user study based on human labeled segment boundaries. Since the segment-level clouds generated from ASR-transcripts prove useful, we examine an adaptation of text tiling techniques to speech in order to be able to generate segments as part of a completely automated indexing and structuring system for browsing of spoken audio. Results demonstrate that the segments generated are comparable with human selected segment boundaries

    Bridging the Gap Between Retrieval and Summarization

    Get PDF
    Information Retrieval is, at its core, a field focused on providing information to users to fulfill an information need. One of the most common use cases of Information Retrieval is document-level retrieval, which seeks to provide a collection of documents to the user that addresses their needs. In contrast to this, single document retrieval seeks to instead provide the user with a single document comprised of all required information. We seek to extend single document retrieval to single document generation, in which we use multiple source documents to create a new document which directly addresses the information need

    Utilizing sub-topical structure of documents for information retrieval.

    Get PDF
    Text segmentation in natural language processing typically refers to the process of decomposing a document into constituent subtopics. Our work centers on the application of text segmentation techniques within information retrieval (IR) tasks. For example, for scoring a document by combining the retrieval scores of its constituent segments, exploiting the proximity of query terms in documents for ad-hoc search, and for question answering (QA), where retrieved passages from multiple documents are aggregated and presented as a single document to a searcher. Feedback in ad hoc IR task is shown to benefit from the use of extracted sentences instead of terms from the pseudo relevant documents for query expansion. Retrieval effectiveness for patent prior art search task is enhanced by applying text segmentation to the patent queries. Another aspect of our work involves augmenting text segmentation techniques to produce segments which are more readable with less unresolved anaphora. This is particularly useful for QA and snippet generation tasks where the objective is to aggregate relevant and novel information from multiple documents satisfying user information need on one hand, and ensuring that the automatically generated content presented to the user is easily readable without reference to the original source document

    Retrieval through explanation : an abductive inference approach to relevance feedback

    Get PDF
    Relevance feedback techniques are designed to automatically improve a system's representation of a query by using documents the user has marked as relevant. However, traditional relevance feedback models suffer from a number of limitations that restrict their potential in supporting information seeking. One of the major limitations of relevance feedback is that it does not incorporate behavioural aspects of information seeking - how and why users assess relevance. We propose that relevance feedback should be viewed as a process of explanation and demonstrate how this limitation of relevance feedback techniques can be overcome by a theory of relevance feedback based on abductive inference
    corecore