21 research outputs found
Exploring accumulative query expansion for relevance feedback
For the participation of Dublin City University (DCU) in the Relevance Feedback (RF) track of INEX 2010, we investigated the relation between the length of relevant text passages and the number of RF terms. In our experiments, relevant passages are segmented into non-overlapping windows of xed length which are sorted by similarity with the query. In each retrieval iteration, we extend the current query with the most frequent terms extracted from these word windows. The number of feedback terms corresponds to a constant number, a number proportional to the length of relevant passages, and a number inversely proportional to the length of relevant passages, respectively. Retrieval experiments show a signicant increase in MAP for INEX 2008 training data and improved precisions at early recall levels for the 2010 topics as compared to the baseline Rocchio feedback
Exploring a Multidimensional Representation of Documents and Queries (extended version)
In Information Retrieval (IR), whether implicitly or explicitly, queries and
documents are often represented as vectors. However, it may be more beneficial
to consider documents and/or queries as multidimensional objects. Our belief is
this would allow building "truly" interactive IR systems, i.e., where
interaction is fully incorporated in the IR framework.
The probabilistic formalism of quantum physics represents events and
densities as multidimensional objects. This paper presents our first step
towards building an interactive IR framework upon this formalism, by stating
how the first interaction of the retrieval process, when the user types a
query, can be formalised. Our framework depends on a number of parameters
affecting the final document ranking. In this paper we experimentally
investigate the effect of these parameters, showing that the proposed
representation of documents and queries as multidimensional objects can compete
with standard approaches, with the additional prospect to be applied to
interactive retrieval
A Graph-based approach for text query expansion using pseudo relevance feedback and association rules mining
Pseudo-relevance feedback is a query expansion approach whose terms are selected from a set of top ranked retrieved documents in response to the original query. However, the selected terms will not be related to the query if the top retrieved documents are irrelevant. As a result, retrieval performance for the expanded query is not improved, compared to the original one. This paper suggests the use of documents selected using Pseudo Relevance Feedback for generating association rules. Thus, an algorithm based on dominance relations is applied. Then the strong correlations between query and other terms are detected, and an oriented and weighted graph called Pseudo-Graph Feedback is constructed. This graph serves for expanding original queries by terms related semantically and selected by the user. The results of the experiments on Text Retrieval Conference (TREC) collection are very significant, and best results are achieved by the proposed approach compared to both the baseline system and an existing technique
Recommended from our members
Machine Learning the Harness Track: A Temporal Investigation of Race History on Prediction
Machine learning techniques have shown their usefulness in accurately predicting greyhound races. Many of the studies within this domain focus on two things; win-only wagers and using a very particular combination of race history. Our study investigates altering these properties and studying the results. In particular we found a race history combination that optimizes our S&C Racing system’s predictions on seven different wager types. From this, S&C Racing posted an impressive 50.44% accuracy in selecting winning wagers with a payout of 10.06 per dollar wagered
Hytexpros : a hypermedia information retrieval system
The Hypermedia information retrieval system makes use of the specific capabilities of hypermedia systems with information retrieval operations and provides new kind of information management tools. It combines both hypermedia and information retrieval to offer end-users the possibility of navigating, browsing and searching a large collection of documents to satisfy an information need. TEXPROS is an intelligent document processing and retrieval system that supports storing, extracting, classifying, categorizing, retrieval and browsing enterprise information. TEXPROS is a perfect application to apply hypermedia information retrieval techniques. In this dissertation, we extend TEXPROS to a hypermedia information retrieval system called HyTEXPROS with hypertext functionalities, such as node, typed and weighted links, anchors, guided-tours, network overview, bookmarks, annotations and comments, and external linkbase. It describes the whole information base including the metadata and the original documents as network nodes connected by links. Through hypertext functionalities, a user can construct dynamically an information path by browsing through pieces of the information base. By adding hypertext functionalities to TEXPROS, HyTEXPROS is created. It changes its working domain from a personal document process domain to a personal library domain accompanied with citation techniques to process original documents. A four-level conceptual architecture is presented as the system architecture of HyTEXPROS. Such architecture is also referred to as the reference model of HyTEXPROS. Detailed description of HyTEXPROS, using the First Order Logic Calculus, is also proposed. An early version of a prototype is briefly described
Agente Fenix: sistema de filtragem personalizada de informações
Nowadays Internet offers such an extensive amount of information, that it becames very difficult for the users to take advantage of it. Searching of a solution, the present work proposes a methodology of implementation of an Intelligent Agent for the personalized information filtering. The idea is to develop a set of autonomous, non-mobile, adaptative agents.The learning mechanism adopted by the agents is "relevance feedback". The search space where the infomlation will be recoveredfrom are the articles and works found in the WWW pages, selected and classified in agreement with subjects of interest. The system was developed to help the academic public, composed by teachers, graduation and masters degreé students. The initial results were quite promising. The proposed agent demonstrated to be a powerful tool of information filtering, reducing the time wasted in that activity.Atualmente, a Internet disponibiliza uma extensa quantidade de informações, para uma vasta gama de usuários, tornando-se difícil de manipular. Inspirado nesta dificuldade, o presente trabalho propõe o uso de Agentes Inteligentes para a filtragem personalizada de informações. A idéia é desenvolver um conjunto de agentes autônomos, fixos e adaptativos, com o objetivo de satisfazer as necessidades de informação do usuário. Para a representação das informações, será adotado o modelo vetor espacial [Salton83], e o mecanismo de aprendizagem dos agentes será o feedback de relevância [Frakes92] .Esse artigo apresenta a descrição do sistema e os resultados de uma primeira etapa, onde foram feitos testes em um ambiente simulado. Em uma próxima etapa, serão conduzidos testes com usuários reais. O espaço de busca por onde serão recuperadas as informações são documentos existentes nas páginas da WWW. O sistema foi desenvolvido para ambientes mono-usuários, e o público-alvo serão professores, alunos e funcionários de uma Universidade
Intelligent Query Answering Through Rule Learning and Generalization
The Department of Defense (DoD) relies heavily on information systems to complete a myriad of tasks, from day-to-day personnel actions to mission critical imagery retrieval, intelligence analysis, and mission planning. The astronomical growth in size and performance of data storage systems leads to problems in processing the amount of data returned on any given query. Typical relational database systems return a set of unordered records. This approach is acceptable in small information systems, but in large systems, such as military image retrieval systems with more than 1 million records, it requires considerable time (often hours to days) to sort through thousands of records and select the relevant for analysis. This research introduces Intelligent Query Answering (IQA) as a novel approach to information retrieval. IQA implements the FOIL algorithm to learn rules based upon user feedback QUI90. The Winnow algorithm adjusts rule weights based on user classification, for improved document orderings BLU97. A semantic tree specific to the domain allows rule generalization across the domain. Testing shows a document sort accuracy rate of 63-93% against a controlled test dataset and 78-89% accuracy rate on a subset of declassified National Air Intelligence Center imagery metadata. These results demonstrate that this research provides groundwork for future efforts in rule learning and rule generalization in the information retrieval field
Axiomatic analysis of smoothing methods in language models for pseudo-relevance feedback
Pseudo-Relevance Feedback (PRF) is an important general technique for improving retrieval effectiveness without requiring any user effort. Several state-of-the-art PRF models are based on the language modeling approach where a query language model is learned based on feedback documents. In all these models, feedback documents are represented with unigram language models smoothed with a collection language model. While collection language model-based smoothing has proven both effective and necessary in using language models for retrieval, we use axiomatic analysis to show that this smoothing scheme inherently causes the feedback model to favor frequent terms and thus violates the IDF constraint needed to ensure selection of discriminative feedback terms. To address this problem, we propose replacing collection language model-based smoothing in the feedback stage with additive smoothing, which is analytically shown to select more discriminative terms. Empirical evaluation further confirms that additive smoothing indeed significantly outperforms collection-based smoothing methods in multiple language model-based PRF models