7,576 research outputs found
Question-answering, relevance feedback and summarisation : TREC-9 interactive track report
In this paper we report on the effectiveness of query-biased summaries for a question-answering task. Our summarisation system presents searchers with short summaries of documents, composed of a series of highly matching sentences extracted from the documents. These summaries are also used as evidence for a query expansion algorithm to test the use of summaries as evidence for interactive and automatic query expansion
A study on the use of summaries and summary-based query expansion for a question-answering task
In this paper we report an initial study on the effectiveness of query-biased summaries for a question answering task. Our summarisation system presents searchers with short summaries of documents. The summaries are composed of a set of sentences that highlight the main points of the document as they relate to the query. These summaries are also used as evidence for a query expansion algorithm to test the use of summaries as evidence for interactive and automatic query expansion. We present the results of a set of experiments to test these two approaches and discuss the relative success of these techniques
Search in the Universe of Big Networks and Data
Searching in the Internet for some object characterised by its attributes in
the form of data, such as a hotel in a certain city whose price is less than
something, is one of our most common activities when we access the Web. We
discuss this problem in a general setting, and compute the average amount of
time and the energy it takes to find an object in an infinitely large search
space. We consider the use of N search agents which act concurrently. Both the
case where the search agent knows which way it needs to go to find the object,
and the case where the search agent is perfectly ignorant and may even head
away from the object being sought. We show that under mild conditions regarding
the randomness of the search and the use of a time-out, the search agent will
always find the object despite the fact that the search space is infinite. We
obtain a formula for the average search time and the average energy expended by
N search agents acting concurrently and independently of each other. We see
that the time-out itself can be used to minimise the search time and the amount
of energy that is consumed to find an object. An approximate formula is derived
for the number of search agents that can help us guarantee that an object is
found in a given time, and we discuss how the competition between search agents
and other agents that try to hide the data object, can be used by opposing
parties to guarantee their own success.Comment: IEEE Network Magazine - Special Issue on Networking for Big Data,
July-August 201
Utilizing sub-topical structure of documents for information retrieval.
Text segmentation in natural language processing typically refers to the process of decomposing a document into constituent subtopics. Our work centers on the application of text segmentation techniques within information retrieval (IR) tasks. For example, for scoring a document by combining the retrieval scores of its constituent segments, exploiting the proximity of query terms in documents for ad-hoc search, and for question answering (QA), where retrieved passages from multiple documents are aggregated and presented as a single document to a searcher. Feedback in ad hoc IR task is shown to benefit from the use of extracted sentences instead of terms from the pseudo relevant documents for query expansion. Retrieval effectiveness for patent prior art search task is enhanced by applying text segmentation to the patent queries. Another aspect of our work involves augmenting text segmentation techniques to produce segments which are more readable with less unresolved anaphora. This is particularly useful for QA and snippet generation tasks where the objective is to aggregate relevant and novel information from multiple documents satisfying user information need on one hand, and ensuring that the automatically generated content presented to the user is easily readable without reference to the original source document
- …