37,530 research outputs found

    Exploring sentence level query expansion in language modeling based information retrieval

    Get PDF
    We introduce two novel methods for query expansion in information retrieval (IR). The basis of these methods is to add the most similar sentences extracted from pseudo-relevant documents to the original query. The first method adds a fixed number of sentences to the original query, the second a progressively decreasing number of sentences. We evaluate these methods on the English and Bengali test collections from the FIRE workshops. The major findings of this study are that: i) performance is similar for both English and Bengali; ii) employing a smaller context (similar sentences) yields a considerably higher mean average precision (MAP) compared to extracting terms from full documents (up to 5.9% improvemnent in MAP for English and 10.7% for Bengali compared to standard Blind Relevance Feedback (BRF); iii) using a variable number of sentences for query expansion performs better and shows less variance in the best MAP for different parameter settings; iv) query expansion based on sentences can improve performance even for topics with low initial retrieval precision where standard BRF fails

    Investigating sentence weighting components for automatic summarisation

    No full text
    The work described here initially formed part of a triangulation exercise to establish the effectiveness of the Query Term Order algorithm. The methodology produced subsequently proved to be a reliable indicator of quality for summarising English web documents. We utilised the human summaries from the Document Understanding Conference data, and generated queries automatically for testing the QTO algorithm. Six sentence weighting schemes that made use of Query Term Frequency and QTO were constructed to produce system summaries, and this paper explains the process of combining and balancing the weighting components. We also examined the five automatically generated query terms in their different permutations to check if the automatic generation of query terms resulting bias. The summaries produced were evaluated by the ROUGE-1 metric, and the results showed that using QTO in a weighting combination resulted in the best performance. We also found that using a combination of more weighting components always produced improved performance compared to any single weighting component

    Clustering the information space using top-ranking sentences : a study of user interaction

    Get PDF
    By considering sentences selected by a query-biased sentence extraction model from the top-retrieved documents, we create a personalised information space which is characterised by the presence of search terms. We cluster this information space, and enable searchers to interact with the resulting clusters. In order to examine whether users can recognise, and benefit from, the clustered organisation, we compare user interaction and performance between an actual clustering and a pseudo-clustering of the information space for completing information seeking tasks. The results provide evidence for the utility and meaningfulness of the clustered organisation

    Using COTS Search Engines and Custom Query Strategies at CLEF

    Get PDF
    This paper presents a system for bilingual information retrieval using commercial off-the-shelf search engines (COTS). Several custom query construction, expansion and translation strategies are compared. We present the experiments and the corresponding results for the CLEF 2004 event

    Exploring accumulative query expansion for relevance feedback

    Get PDF
    For the participation of Dublin City University (DCU) in the Relevance Feedback (RF) track of INEX 2010, we investigated the relation between the length of relevant text passages and the number of RF terms. In our experiments, relevant passages are segmented into non-overlapping windows of xed length which are sorted by similarity with the query. In each retrieval iteration, we extend the current query with the most frequent terms extracted from these word windows. The number of feedback terms corresponds to a constant number, a number proportional to the length of relevant passages, and a number inversely proportional to the length of relevant passages, respectively. Retrieval experiments show a signicant increase in MAP for INEX 2008 training data and improved precisions at early recall levels for the 2010 topics as compared to the baseline Rocchio feedback

    Effective pattern discovery for text mining

    Get PDF
    Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase) based approaches should perform better than the term-based ones, but many experiments did not support this hypothesis. This paper presents an innovative technique, effective pattern discovery which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance

    Concept-based Interactive Query Expansion Support Tool (CIQUEST)

    Get PDF
    This report describes a three-year project (2000-03) undertaken in the Information Studies Department at The University of Sheffield and funded by Resource, The Council for Museums, Archives and Libraries. The overall aim of the research was to provide user support for query formulation and reformulation in searching large-scale textual resources including those of the World Wide Web. More specifically the objectives were: to investigate and evaluate methods for the automatic generation and organisation of concepts derived from retrieved document sets, based on statistical methods for term weighting; and to conduct user-based evaluations on the understanding, presentation and retrieval effectiveness of concept structures in selecting candidate terms for interactive query expansion. The TREC test collection formed the basis for the seven evaluative experiments conducted in the course of the project. These formed four distinct phases in the project plan. In the first phase, a series of experiments was conducted to investigate further techniques for concept derivation and hierarchical organisation and structure. The second phase was concerned with user-based validation of the concept structures. Results of phases 1 and 2 informed on the design of the test system and the user interface was developed in phase 3. The final phase entailed a user-based summative evaluation of the CiQuest system. The main findings demonstrate that concept hierarchies can effectively be generated from sets of retrieved documents and displayed to searchers in a meaningful way. The approach provides the searcher with an overview of the contents of the retrieved documents, which in turn facilitates the viewing of documents and selection of the most relevant ones. Concept hierarchies are a good source of terms for query expansion and can improve precision. The extraction of descriptive phrases as an alternative source of terms was also effective. With respect to presentation, cascading menus were easy to browse for selecting terms and for viewing documents. In conclusion the project dissemination programme and future work are outlined
    corecore