7,576 research outputs found

    Question-answering, relevance feedback and summarisation : TREC-9 interactive track report

    Get PDF
    In this paper we report on the effectiveness of query-biased summaries for a question-answering task. Our summarisation system presents searchers with short summaries of documents, composed of a series of highly matching sentences extracted from the documents. These summaries are also used as evidence for a query expansion algorithm to test the use of summaries as evidence for interactive and automatic query expansion

    A study on the use of summaries and summary-based query expansion for a question-answering task

    Get PDF
    In this paper we report an initial study on the effectiveness of query-biased summaries for a question answering task. Our summarisation system presents searchers with short summaries of documents. The summaries are composed of a set of sentences that highlight the main points of the document as they relate to the query. These summaries are also used as evidence for a query expansion algorithm to test the use of summaries as evidence for interactive and automatic query expansion. We present the results of a set of experiments to test these two approaches and discuss the relative success of these techniques

    Search in the Universe of Big Networks and Data

    Full text link
    Searching in the Internet for some object characterised by its attributes in the form of data, such as a hotel in a certain city whose price is less than something, is one of our most common activities when we access the Web. We discuss this problem in a general setting, and compute the average amount of time and the energy it takes to find an object in an infinitely large search space. We consider the use of N search agents which act concurrently. Both the case where the search agent knows which way it needs to go to find the object, and the case where the search agent is perfectly ignorant and may even head away from the object being sought. We show that under mild conditions regarding the randomness of the search and the use of a time-out, the search agent will always find the object despite the fact that the search space is infinite. We obtain a formula for the average search time and the average energy expended by N search agents acting concurrently and independently of each other. We see that the time-out itself can be used to minimise the search time and the amount of energy that is consumed to find an object. An approximate formula is derived for the number of search agents that can help us guarantee that an object is found in a given time, and we discuss how the competition between search agents and other agents that try to hide the data object, can be used by opposing parties to guarantee their own success.Comment: IEEE Network Magazine - Special Issue on Networking for Big Data, July-August 201

    Utilizing sub-topical structure of documents for information retrieval.

    Get PDF
    Text segmentation in natural language processing typically refers to the process of decomposing a document into constituent subtopics. Our work centers on the application of text segmentation techniques within information retrieval (IR) tasks. For example, for scoring a document by combining the retrieval scores of its constituent segments, exploiting the proximity of query terms in documents for ad-hoc search, and for question answering (QA), where retrieved passages from multiple documents are aggregated and presented as a single document to a searcher. Feedback in ad hoc IR task is shown to benefit from the use of extracted sentences instead of terms from the pseudo relevant documents for query expansion. Retrieval effectiveness for patent prior art search task is enhanced by applying text segmentation to the patent queries. Another aspect of our work involves augmenting text segmentation techniques to produce segments which are more readable with less unresolved anaphora. This is particularly useful for QA and snippet generation tasks where the objective is to aggregate relevant and novel information from multiple documents satisfying user information need on one hand, and ensuring that the automatically generated content presented to the user is easily readable without reference to the original source document
    corecore