11,523 research outputs found
Investigating sentence weighting components for automatic summarisation
The work described here initially formed part of a triangulation exercise to establish the effectiveness of the Query Term Order algorithm. The methodology produced subsequently proved to be a reliable indicator of quality for summarising English web documents. We utilised the human summaries from the Document Understanding Conference data, and generated queries automatically for testing the QTO algorithm. Six sentence weighting schemes that made use of Query Term Frequency and QTO were constructed to produce system summaries, and this paper explains the process of combining and balancing the weighting components. We also examined the five automatically generated query terms in their different permutations to check if the automatic generation of query terms resulting bias. The summaries produced were evaluated by the ROUGE-1 metric, and the results showed that using QTO in a weighting combination resulted in the best performance. We also found that using a combination of more weighting components always produced improved performance compared to any single weighting component
A simulated study of implicit feedback models
In this paper we report on a study of implicit feedback models for unobtrusively tracking the information needs of searchers. Such models use relevance information gathered from searcher interaction and can be a potential substitute for explicit relevance feedback. We introduce a variety of implicit feedback models designed to enhance an Information Retrieval (IR) system's representation of searchers' information needs. To benchmark their performance we use a simulation-centric evaluation methodology that measures how well each model learns relevance and improves search effectiveness. The results show that a heuristic-based binary voting model and one based on Jeffrey's rule of conditioning [5] outperform the other models under investigation
CRL at Ntcir2
We have developed systems of two types for NTCIR2. One is an enhenced version
of the system we developed for NTCIR1 and IREX. It submitted retrieval results
for JJ and CC tasks. A variety of parameters were tried with the system. It
used such characteristics of newspapers as locational information in the CC
tasks. The system got good results for both of the tasks. The other system is a
portable system which avoids free parameters as much as possible. The system
submitted retrieval results for JJ, JE, EE, EJ, and CC tasks. The system
automatically determined the number of top documents and the weight of the
original query used in automatic-feedback retrieval. It also determined
relevant terms quite robustly. For EJ and JE tasks, it used document expansion
to augment the initial queries. It achieved good results, except on the CC
tasks.Comment: 11 pages. Computation and Language. This paper describes our results
of information retrieval in the NTCIR2 contes
Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval
Although more and more language pairs are covered by machine translation
services, there are still many pairs that lack translation resources.
Cross-language information retrieval (CLIR) is an application which needs
translation functionality of a relatively low level of sophistication since
current models for information retrieval (IR) are still based on a
bag-of-words. The Web provides a vast resource for the automatic construction
of parallel corpora which can be used to train statistical translation models
automatically. The resulting translation models can be embedded in several ways
in a retrieval model. In this paper, we will investigate the problem of
automatically mining parallel texts from the Web and different ways of
integrating the translation models within the retrieval process. Our
experiments on standard test collections for CLIR show that the Web-based
translation models can surpass commercial MT systems in CLIR tasks. These
results open the perspective of constructing a fully automatic query
translation device for CLIR at a very low cost.Comment: 37 page
Question-answering, relevance feedback and summarisation : TREC-9 interactive track report
In this paper we report on the effectiveness of query-biased summaries for a question-answering task. Our summarisation system presents searchers with short summaries of documents, composed of a series of highly matching sentences extracted from the documents. These summaries are also used as evidence for a query expansion algorithm to test the use of summaries as evidence for interactive and automatic query expansion
- …