21 research outputs found

    Exploring accumulative query expansion for relevance feedback

    Get PDF
    For the participation of Dublin City University (DCU) in the Relevance Feedback (RF) track of INEX 2010, we investigated the relation between the length of relevant text passages and the number of RF terms. In our experiments, relevant passages are segmented into non-overlapping windows of xed length which are sorted by similarity with the query. In each retrieval iteration, we extend the current query with the most frequent terms extracted from these word windows. The number of feedback terms corresponds to a constant number, a number proportional to the length of relevant passages, and a number inversely proportional to the length of relevant passages, respectively. Retrieval experiments show a signicant increase in MAP for INEX 2008 training data and improved precisions at early recall levels for the 2010 topics as compared to the baseline Rocchio feedback

    Exploring a Multidimensional Representation of Documents and Queries (extended version)

    Get PDF
    In Information Retrieval (IR), whether implicitly or explicitly, queries and documents are often represented as vectors. However, it may be more beneficial to consider documents and/or queries as multidimensional objects. Our belief is this would allow building "truly" interactive IR systems, i.e., where interaction is fully incorporated in the IR framework. The probabilistic formalism of quantum physics represents events and densities as multidimensional objects. This paper presents our first step towards building an interactive IR framework upon this formalism, by stating how the first interaction of the retrieval process, when the user types a query, can be formalised. Our framework depends on a number of parameters affecting the final document ranking. In this paper we experimentally investigate the effect of these parameters, showing that the proposed representation of documents and queries as multidimensional objects can compete with standard approaches, with the additional prospect to be applied to interactive retrieval

    A Graph-based approach for text query expansion using pseudo relevance feedback and association rules mining

    Get PDF
    Pseudo-relevance feedback is a query expansion approach whose terms are selected from a set of top ranked retrieved documents in response to the original query.  However, the selected terms will not be related to the query if the top retrieved documents are irrelevant. As a result, retrieval performance for the expanded query is not improved, compared to the original one. This paper suggests the use of documents selected using Pseudo Relevance Feedback for generating association rules. Thus, an algorithm based on dominance relations is applied. Then the strong correlations between query and other terms are detected, and an oriented and weighted graph called Pseudo-Graph Feedback is constructed. This graph serves for expanding original queries by terms related semantically and selected by the user. The results of the experiments on Text Retrieval Conference (TREC) collection are very significant, and best results are achieved by the proposed approach compared to both the baseline system and an existing technique

    Hytexpros : a hypermedia information retrieval system

    Get PDF
    The Hypermedia information retrieval system makes use of the specific capabilities of hypermedia systems with information retrieval operations and provides new kind of information management tools. It combines both hypermedia and information retrieval to offer end-users the possibility of navigating, browsing and searching a large collection of documents to satisfy an information need. TEXPROS is an intelligent document processing and retrieval system that supports storing, extracting, classifying, categorizing, retrieval and browsing enterprise information. TEXPROS is a perfect application to apply hypermedia information retrieval techniques. In this dissertation, we extend TEXPROS to a hypermedia information retrieval system called HyTEXPROS with hypertext functionalities, such as node, typed and weighted links, anchors, guided-tours, network overview, bookmarks, annotations and comments, and external linkbase. It describes the whole information base including the metadata and the original documents as network nodes connected by links. Through hypertext functionalities, a user can construct dynamically an information path by browsing through pieces of the information base. By adding hypertext functionalities to TEXPROS, HyTEXPROS is created. It changes its working domain from a personal document process domain to a personal library domain accompanied with citation techniques to process original documents. A four-level conceptual architecture is presented as the system architecture of HyTEXPROS. Such architecture is also referred to as the reference model of HyTEXPROS. Detailed description of HyTEXPROS, using the First Order Logic Calculus, is also proposed. An early version of a prototype is briefly described

    Agente Fenix: sistema de filtragem personalizada de informações

    Get PDF
    Nowadays Internet offers such an extensive amount of information, that it becames very difficult for the users to take advantage of it. Searching of a solution, the present work proposes a methodology of implementation of an Intelligent Agent for the personalized information filtering. The idea is to develop a set of autonomous, non-mobile, adaptative agents.The learning mechanism adopted by the agents is "relevance feedback". The search space where the infomlation will be recoveredfrom are the articles and works found in the WWW pages, selected and classified in agreement with subjects of interest. The system was developed to help the academic public, composed by teachers, graduation and masters degreé students. The initial results were quite promising. The proposed agent demonstrated to be a powerful tool of information filtering, reducing the time wasted in that activity.Atualmente, a Internet disponibiliza uma extensa quantidade de informações, para uma vasta gama de usuários, tornando-se difícil de manipular. Inspirado nesta dificuldade, o presente trabalho propõe o uso de Agentes Inteligentes para a filtragem personalizada de informações. A idéia é desenvolver um conjunto de agentes autônomos, fixos e adaptativos, com o objetivo de satisfazer as necessidades de informação do usuário. Para a representação das informações, será adotado o modelo vetor espacial [Salton83], e o mecanismo de aprendizagem dos agentes será o feedback de relevância [Frakes92] .Esse artigo apresenta a descrição do sistema e os resultados de uma primeira etapa, onde foram feitos testes em um ambiente simulado. Em uma próxima etapa, serão conduzidos testes com usuários reais. O espaço de busca por onde serão recuperadas as informações são documentos existentes nas páginas da WWW. O sistema foi desenvolvido para ambientes mono-usuários, e o público-alvo serão professores, alunos e funcionários de uma Universidade

    Applying summarization techniques for term selection in relevance feedback

    Full text link

    Intelligent Query Answering Through Rule Learning and Generalization

    Get PDF
    The Department of Defense (DoD) relies heavily on information systems to complete a myriad of tasks, from day-to-day personnel actions to mission critical imagery retrieval, intelligence analysis, and mission planning. The astronomical growth in size and performance of data storage systems leads to problems in processing the amount of data returned on any given query. Typical relational database systems return a set of unordered records. This approach is acceptable in small information systems, but in large systems, such as military image retrieval systems with more than 1 million records, it requires considerable time (often hours to days) to sort through thousands of records and select the relevant for analysis. This research introduces Intelligent Query Answering (IQA) as a novel approach to information retrieval. IQA implements the FOIL algorithm to learn rules based upon user feedback QUI90. The Winnow algorithm adjusts rule weights based on user classification, for improved document orderings BLU97. A semantic tree specific to the domain allows rule generalization across the domain. Testing shows a document sort accuracy rate of 63-93% against a controlled test dataset and 78-89% accuracy rate on a subset of declassified National Air Intelligence Center imagery metadata. These results demonstrate that this research provides groundwork for future efforts in rule learning and rule generalization in the information retrieval field

    Axiomatic analysis of smoothing methods in language models for pseudo-relevance feedback

    Get PDF
    Pseudo-Relevance Feedback (PRF) is an important general technique for improving retrieval effectiveness without requiring any user effort. Several state-of-the-art PRF models are based on the language modeling approach where a query language model is learned based on feedback documents. In all these models, feedback documents are represented with unigram language models smoothed with a collection language model. While collection language model-based smoothing has proven both effective and necessary in using language models for retrieval, we use axiomatic analysis to show that this smoothing scheme inherently causes the feedback model to favor frequent terms and thus violates the IDF constraint needed to ensure selection of discriminative feedback terms. To address this problem, we propose replacing collection language model-based smoothing in the feedback stage with additive smoothing, which is analytically shown to select more discriminative terms. Empirical evaluation further confirms that additive smoothing indeed significantly outperforms collection-based smoothing methods in multiple language model-based PRF models
    corecore