13 research outputs found

    The troubles with using a logical model of IR on a large collection of documents

    Get PDF
    This is a paper of two halves. First, a description of a logical model of IR known as imaging will be presented. Unfortunately due to constraints of time and computing resource this model was not implemented in time for this round of TREC. Therefore this paper's second half describes the more conventional IR model and system used to generate the Glasgow IR result set (glair1)

    The troubles with using a logical model of IR on a large collection of documents

    Get PDF
    This is a paper of two halves. First, a description of a logical model of IR known as imaging will be presented. Unfortunately due to constraints of time and computing resource this model was not implemented in time for this round of TREC. Therefore this paper’s second half describes the more conventional IR model and system used to generate the Glasgow IR result set (glair1)

    A New Lattice-Based Information Retrieval Theory

    No full text
    Logic-based Information Retrieval (IR) models represent the retrieval decision as an implication d → q between a document d and a query q, where d and q are logical sentences. However, d → q is a bi- nary decision, we thus need a measure to estimate the degree to which d implies q, noted P(d → q). The main problems in the logic-based IR models are the difficulties to implement the decision algorithms and to define the uncertainty measure P as a part of the logic. In this study, we chose the Propositional Logic (PL) as the underlying framework. We propose to replace the implication d → q by the material implication d ⊃ q. However, we know that there is a mapping between PL and the lattice theory. In addition, Knuth [13] introduced the notion of degree of inclusion to quantify the ordering relations defined on lattices. There- fore, we position documents and queries on a lattice, where the ordering relation is equivalent to the material implication. In this case, the impli- cation d → q is replaced by an ordering relation between documents and queries, and the uncertainty P(d → q) is redefined using the degree of inclusion measure. This new IR model is: 1- general where it is possible to instantiate most of classical IR models depending on our lattice-based model, 2- capable to formally prove the intuition of Rijsbergen about replacing P (d → q) by P (q|d), and 3- easy to implement

    RELIEFS : un système d'inspiration cognitive pour le filtrage adaptatif de documents textuels

    No full text
    International audienceL'objet de cet article est la présentation d'un nouveau système nommé RELIEFS (pour RELevance Information Extraction Fuzzy System) pour le filtrage adaptatif de documents textuels. Les grands principes de fonctionnement de ce système s'inspirent de mécanismes cognitifs intervenant dans les processus de sélection de l'information. Plus précisément, notre recherche part de l'analyse de modèles de la mémoire sémantique (accès et organisation des connaissances en mémoire) et de modèles qui rendent compte de phénomènes attentionnels (sélection des informations provenant de l'environnement). Des liens forts sont tissés entre ces modèles et des modèles traditionnellement utilisés en RI. Une nouvelle interprétation de la notion de pertinence est proposée. L'analyse nous conduit à extraire un ensemble de mécanismes de base renvoyant aux notions d'activation et de propagation d'activation pour la sélection d'information " pertinentes ". Ces mécanismes sont implémentés et testés avec succès dans la tâche de filtrage adaptatif de TREC9

    A survey on the use of relevance feedback for information access systems

    Get PDF
    Users of online search engines often find it difficult to express their need for information in the form of a query. However, if the user can identify examples of the kind of documents they require then they can employ a technique known as relevance feedback. Relevance feedback covers a range of techniques intended to improve a user's query and facilitate retrieval of information relevant to a user's information need. In this paper we survey relevance feedback techniques. We study both automatic techniques, in which the system modifies the user's query, and interactive techniques, in which the user has control over query modification. We also consider specific interfaces to relevance feedback systems and characteristics of searchers that can affect the use and success of relevance feedback systems

    Application of aboutness to functional benchmarking in information retrieval

    Get PDF
    Experimental approaches are widely employed to benchmark the performance of an information retrieval (IR) system. Measurements in terms of recall and precision are computed as performance indicators. Although they are good at assessing the retrieval effectiveness of an IR system, they fail to explore deeper aspects such as its underlying functionality and explain why the system shows such performance. Recently, inductive (i.e., theoretical) evaluation of IR systems has been proposed to circumvent the controversies of the experimental methods. Several studies have adopted the inductive approach, but they mostly focus on theoretical modeling of IR properties by using some metalogic. In this article, we propose to use inductive evaluation for functional benchmarking of IR models as a complement of the traditional experiment-based performance benchmarking. We define a functional benchmark suite in two stages: the evaluation criteria based on the notion of "aboutness," and the formal evaluation methodology using the criteria. The proposed benchmark has been successfully applied to evaluate various well-known classical and logic-based IR models. The functional benchmarking results allow us to compare and analyze the functionality of the different IR models

    Towards a belief revision based adaptive and context sensitive information retrieval system

    Get PDF
    In an adaptive information retrieval (IR) setting, the information seekers' beliefs about which terms are relevant or nonrelevant will naturally fluctuate. This article investigates how the theory of belief revision can be used to model adaptive IR. More specifically, belief revision logic provides a rich representation scheme to formalize retrieval contexts so as to disambiguate vague user queries. In addition, belief revision theory underpins the development of an effective mechanism to revise user profiles in accordance with information seekers' changing information needs. It is argued that information retrieval contexts can be extracted by means of the information-flow text mining method so as to realize a highly autonomous adaptive IR system. The extra bonus of a belief-based IR model is that its retrieval behavior is more predictable and explanatory. Our initial experiments show that the belief-based adaptive IR system is as effective as a classical adaptive IR system. To our best knowledge, this is the first successful implementation and evaluation of a logic-based adaptive IR model which can efficiently process large IR collections

    An effective Chinese indexing method based on partitioned signature files.

    Get PDF
    Wong Chi Yin.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 107-114).Abstract also in Chinese.Abstract --- p.iiAcknowledgements --- p.viChapter 1 --- Introduction --- p.1Chapter 1.1 --- Introduction to Chinese IR --- p.1Chapter 1.2 --- Contributions --- p.3Chapter 1.3 --- Organization of this Thesis --- p.5Chapter 2 --- Background --- p.6Chapter 2.1 --- Indexing methods --- p.6Chapter 2.1.1 --- Full-text scanning --- p.7Chapter 2.1.2 --- Inverted files --- p.7Chapter 2.1.3 --- Signature files --- p.9Chapter 2.1.4 --- Clustering --- p.10Chapter 2.2 --- Information Retrieval Models --- p.10Chapter 2.2.1 --- Boolean model --- p.11Chapter 2.2.2 --- Vector space model --- p.11Chapter 2.2.3 --- Probabilistic model --- p.13Chapter 2.2.4 --- Logical model --- p.14Chapter 3 --- Investigation of Segmentation on the Vector Space Retrieval Model --- p.15Chapter 3.1 --- Segmentation of Chinese Texts --- p.16Chapter 3.1.1 --- Character-based segmentation --- p.16Chapter 3.1.2 --- Word-based segmentation --- p.18Chapter 3.1.3 --- N-Gram segmentation --- p.21Chapter 3.2 --- Performance Evaluation of Three Segmentation Approaches --- p.23Chapter 3.2.1 --- Experimental Setup --- p.23Chapter 3.2.2 --- Experimental Results --- p.24Chapter 3.2.3 --- Discussion --- p.29Chapter 4 --- Signature File Background --- p.32Chapter 4.1 --- Superimposed coding --- p.34Chapter 4.2 --- False drop probability --- p.36Chapter 5 --- Partitioned Signature File Based On Chinese Word Length --- p.39Chapter 5.1 --- Fixed Weight Block (FWB) Signature File --- p.41Chapter 5.2 --- Overview of PSFC --- p.45Chapter 5.3 --- Design Considerations --- p.50Chapter 6 --- New Hashing Techniques for Partitioned Signature Files --- p.59Chapter 6.1 --- Direct Division Method --- p.61Chapter 6.2 --- Random Number Assisted Division Method --- p.62Chapter 6.3 --- Frequency-based hashing method --- p.64Chapter 6.4 --- Chinese character-based hashing method --- p.68Chapter 7 --- Experiments and Results --- p.72Chapter 7.1 --- Performance evaluation of partitioned signature file based on Chi- nese word length --- p.74Chapter 7.1.1 --- Retrieval Performance --- p.75Chapter 7.1.2 --- Signature Reduction Ratio --- p.77Chapter 7.1.3 --- Storage Requirement --- p.79Chapter 7.1.4 --- Discussion --- p.81Chapter 7.2 --- Performance evaluation of different dynamic signature generation methods --- p.82Chapter 7.2.1 --- Collision --- p.84Chapter 7.2.2 --- Retrieval Performance --- p.86Chapter 7.2.3 --- Discussion --- p.89Chapter 8 --- Conclusions and Future Work --- p.91Chapter 8.1 --- Conclusions --- p.91Chapter 8.2 --- Future work --- p.95Chapter A --- Notations of Signature Files --- p.96Chapter B --- False Drop Probability --- p.98Chapter C --- Experimental Results --- p.103Bibliography --- p.10

    A study of the kinematics of probabilities in information retrieval

    Get PDF
    In Information Retrieval (IR), probabilistic modelling is related to the use of a model that ranks documents in decreasing order of their estimated probability of relevance to a user's information need expressed by a query. In an IR system based on a probabilistic model, the user is guided to examine first the documents that are the most likely to be relevant to his need. If the system performed well, these documents should be at the top of the retrieved list. In mathematical terms the problem consists of estimating the probability P(R | q,d), that is the probability of relevance given a query q and a document d. This estimate should be performed for every document in the collection, and documents should then be ranked according to this measure. For this evaluation the system should make use of all the information available in the indexing term space. This thesis contains a study of the kinematics of probabilities in probabilistic IR. The aim is to get a better insight of the behaviour of the probabilistic models of IR currently in use and to propose new and more effective models by exploiting different kinematics of probabilities. The study is performed both from a theoretical and an experimental point of view. Theoretically, the thesis explores the use of the probability of a conditional, namely P(d → q), to estimate the conditional probability P(R | q,d). This is achieved by interpreting the term space in the context of the "possible worlds semantics". Previous approaches in this direction had as their basic assumption the consideration that "a document is a possible world". In this thesis a different approach is adopted, based on the assumption that "a term is a possible world". This approach enables the exploitation of term-term semantic relationships in the term space, estimated using an information theoretic measure. This form of information is rarely used in IR at retrieval time. Two new models of IR are proposed, based on two different way of estimating P(d → q) using a logical technique called Imaging. The first model is called Retrieval by Logical Imaging; the second is called Retrieval by General Logical Imaging, being a generalisation of the first model. The probability kinematics of these two models is compared with that of two other proposed models: the Retrieval by Joint Probability model and the Retrieval by Conditional Probability model. These last two models mimic the probability kinematics of the Vector Space model and of the Probabilistic Retrieval model. Experimentally, the retrieval effectiveness of the above four models is analysed and compared using five test collections of different sizes and characteristics. The results of this experimentation depend heavily on the choice of term weight and term similarity measures adopted. The most important conclusion of this thesis is that theoretically a probability transfer that takes into account the semantic similarity between the probability-donor and the probability-recipient is more effective than a probability transfer that does not take that into account. In the context of IR this is equivalent to saying that models that exploit the semantic similarity between terms in the term space at retrieval time are more effective that models that do not do that. Unfortunately, while the experimental investigation carried out using small test collections provide evidence supporting this conclusion, experiments performed using larger test collections do not provide as much supporting evidence (although they do not provide contrasting evidence either). The peculiar characteristics of the term space of different collections play an important role in shaping the effects that different probability kinematics have on the effectiveness of the retrieval process. The above result suggests the necessity and the usefulness of further investigations into more complex and optimised models of probabilistic IR, where probability kinematics follows non-classical approaches. The models proposed in this thesis are just two such approaches; other ones can be developed using recent results achieved in other fields, such as non-classical logics and belief revision theory
    corecore