203 research outputs found

    Document ranking with quantum probabilities

    Get PDF
    In this thesis we investigate the use of quantum probability theory for ranking documents. Quantum probability theory is used to estimate the probability of relevance of a document given a user's query. We posit that quantum probability theory can lead to a better estimation of the probability of a document being relevant to a user's query than the common approach, i.e. the Probability Ranking Principle (PRP), which is based upon Kolmogorovian probability theory. Following our hypothesis, we formulate an analogy between the document retrieval scenario and a physical scenario, that of the double slit experiment. Through the analogy, we propose a novel ranking approach, the quantum probability ranking principle (qPRP). Key to our proposal is the presence of quantum interference. Mathematically, this is the statistical deviation between empirical observations and expected values predicted by the Kolmogorovian rule of additivity of probabilities of disjoint events in configurations such that of the double slit experiment. We propose an interpretation of quantum interference in the document ranking scenario, and examine how quantum interference can be effectively estimated for document retrieval. To validate our proposal and to gain more insights about approaches for document ranking, we (1) analyse PRP, qPRP and other ranking approaches, exposing the assumptions underlying their ranking criteria and formulating the conditions for the optimality of the two ranking principles, (2) empirically compare three ranking principles (i.e. PRP, interactive PRP, and qPRP) and two state-of-the-art ranking strategies in two retrieval scenarios, those of ad-hoc retrieval and diversity retrieval, (3) analytically contrast the ranking criteria of the examined approaches, exposing similarities and differences, (4) study the ranking behaviours of approaches alternative to PRP in terms of the kinematics they impose on relevant documents, i.e. by considering the extent and direction of the movements of relevant documents across the ranking recorded when comparing PRP against its alternatives. Our findings show that the effectiveness of the examined ranking approaches strongly depends upon the evaluation context. In the traditional evaluation context of ad-hoc retrieval, PRP is empirically shown to be better or comparable to alternative ranking approaches. However, when we turn to examine evaluation contexts that account for interdependent document relevance (i.e. when the relevance of a document is assessed also with respect to other retrieved documents, as it is the case in the diversity retrieval scenario) then the use of quantum probability theory and thus of qPRP is shown to improve retrieval and ranking effectiveness over the traditional PRP and alternative ranking strategies, such as Maximal Marginal Relevance, Portfolio theory, and Interactive PRP. This work represents a significant step forward regarding the use of quantum theory in information retrieval. It demonstrates in fact that the application of quantum theory to problems within information retrieval can lead to improvements both in modelling power and retrieval effectiveness, allowing the constructions of models that capture the complexity of information retrieval situations. Furthermore, the thesis opens up a number of lines for future research. These include (1) investigating estimations and approximations of quantum interference in qPRP, (2) exploiting complex numbers for the representation of documents and queries, and (3) applying the concepts underlying qPRP to tasks other than document ranking

    Advances in formal models of search and search behaviour

    Get PDF
    Searching is performed in the context of a task and as such the value of the information found is with respect to the task. Recently, there has been a drive to developing formal models of information seeking and retrieval that consider the costs and benefits arising through the interaction with the interface/system and the information surfaced during that interaction. In this full day tutorial we will focus on describing and explaining some of the more recent and latest formal models of Information Seeking and Retrieval. The tutorial is structured into two parts. In the first part we will present a series of models that have been developed based on: (i) economic theory, (ii) decision theory (iii) game theory and (iv) optimal foraging theory. The second part of the day will be dedicated to building models where we will discuss different techniques to build and develop models from which we can draw testable hypotheses from. During the tutorial participants will be challenged to develop various formals models, applying the techniques learnt during the day. We will then conclude with presentations on solutions followed by a summary and overview of challenges and future directions. This tutorial is aimed at participants wanting to know more about the various formal models of information seeking, search and retrieval, that have been proposed. The tutorial will be presented at an intermediate level, and is designed to support participants who want to be able to understand and build such models

    ChatGPT Hallucinates when Attributing Answers

    Full text link
    Can ChatGPT provide evidence to support its answers? Does the evidence it suggests actually exist and does it really support its answer? We investigate these questions using a collection of domain-specific knowledge-based questions, specifically prompting ChatGPT to provide both an answer and supporting evidence in the form of references to external sources. We also investigate how different prompts impact answers and evidence. We find that ChatGPT provides correct or partially correct answers in about half of the cases (50.6% of the times), but its suggested references only exist 14% of the times. We further provide insights on the generated references that reveal common traits among the references that ChatGPT generates, and show how even if a reference provided by the model does exist, this reference often does not support the claims ChatGPT attributes to it. Our findings are important because (1) they are the first systematic analysis of the references created by ChatGPT in its answers; (2) they suggest that the model may leverage good quality information in producing correct answers, but is unable to attribute real evidence to support its answers. Prompts, raw result files and manual analysis are made publicly available

    Collagen Fiber Re-Alignment and Uncrimping in Response to Loading: Determining Structure-Function Relationships Using a Developmental Tendon Mouse Model

    Get PDF
    Collagen fiber re-alignment and uncrimping are postulated mechanisms of structural response to load. It has been suggested that fibers re-orient in the direction of load and then uncrimp before collagen is tensioned and that in general, the structure is a result of the function tendons perform. However, little is known about how fiber re-alignment and uncrimping change in response to load, how this change relates to tendon mechanical properties, and if these changes are dependent on the underlying structure. Throughout postnatal development, dramatic structural and compositional changes occur in tendon. Postnatal tendons, with immature collagen networks, may respond to load in a different manner and timescale than mature collagen networks. Therefore, the overall objective of this study was to quantify the mechanical properties and structural response to load in a developmental mouse tendon model at 4, 10, 28 and 90 days old. Local collagen fiber re-alignment and crimp frequency were quantified throughout mechanical testing and local mechanical properties were measured. Throughout development, fiber re-alignment occurred at different points in the mechanical testing protocol. At early development, re-alignment was not identified until the linear (4 days) or toe-regions (10 days) of the mechanical test suggesting that fibers required a prolonged exposure to mechanical load before responding and that the immature collagen network present may delay re-alignment. The uncrimping of collagen fibers was identified during the toe-region of the mechanical test at all ages suggesting that crimp contributes to tendon nonlinear behavior. Additionally, results at 28 and 90 days suggested that collagen fiber crimp frequency decreased with increasing number of preconditioning cycles, which may affect toe-region properties. Mechanical properties and cross-sectional area increased throughout development. The insertion site demonstrated lower moduli values and a more disorganized fiber distribution compared to the midsubstance at all ages suggesting it experiences multi-axial loads. Further, the tendon locations demonstrated different re-alignment and crimp behaviors suggesting that locations may respond to load differently and develop at different rates. Results from this study suggest that structure affects the tendon\u27s ability to respond to load and that the loading protocol applied may affect the measurement of mechanical properties

    How to Forget Clients in Federated Online Learning to Rank?

    Full text link
    Data protection legislation like the European Union's General Data Protection Regulation (GDPR) establishes the \textit{right to be forgotten}: a user (client) can request contributions made using their data to be removed from learned models. In this paper, we study how to remove the contributions made by a client participating in a Federated Online Learning to Rank (FOLTR) system. In a FOLTR system, a ranker is learned by aggregating local updates to the global ranking model. Local updates are learned in an online manner at a client-level using queries and implicit interactions that have occurred within that specific client. By doing so, each client's local data is not shared with other clients or with a centralised search service, while at the same time clients can benefit from an effective global ranking model learned from contributions of each client in the federation. In this paper, we study an effective and efficient unlearning method that can remove a client's contribution without compromising the overall ranker effectiveness and without needing to retrain the global ranker from scratch. A key challenge is how to measure whether the model has unlearned the contributions from the client c∗c^* that has requested removal. For this, we instruct c∗c^* to perform a poisoning attack (add noise to this client updates) and then we measure whether the impact of the attack is lessened when the unlearning process has taken place. Through experiments on four datasets, we demonstrate the effectiveness and efficiency of the unlearning strategy under different combinations of parameter settings.Comment: Accepted in ECIR 202

    QUT ielab at CLEF 2017 e-Health IR Task: Knowledge Base Retrieval for Consumer Health Search

    Get PDF
    In this paper we describe our participation to the CLEF 2017 e-Health IR Task [6]. This track aims to evaluate and advance search technologies aimed at supporting consumers to and health advice online. Our solution addressed this challenge by developing a knowledge base (KB) query expansion method. We found that the two best KB query expansion methods are mapping entity mentions to KB entities by performing exact matching entity mentions to the KB aliases (EM-Aliases) and multi-matching entity mentions to all KB features (Title, Categories, Links, Aliases, and Body) (EM-All). After mapping between entity mentions to KB entities established, we found the Title of the mapped KB entities as the best source of expansion terms compared to the aliases or combination of both features. Finally, we also found that Relevance Feedback and Pseudo Relevance Feedback are effective to further improve the query effectiveness

    Choices in Knowledge-Base Retrieval for Consumer Health Search

    Get PDF
    This paper investigates how retrieval using knowledge bases can be effectively translated to the consumer health search (CHS) domain. We posit that using knowledge bases for query reformulation may help to overcome some of the challenges in CHS. However, translating and implementing such approaches is nontrivial in CHS as it involves many design choices. We empirically evaluated the impact these different choices had on retrieval effectiveness. A state-of-the-art knowledge-base retrieval model—the Entity Query Feature Expansion model—was used to evaluate the following design choices: which knowledge base to use (specialised vs. generic), how to construct the knowledge base, how to extract entities from queries and map them to entities in the knowledge base, what part of the knowledge base to use for query expansion, and if to augment the KB search process with relevance feedback. While knowledge base retrieval has been proposed as a solution for CHS, this paper delves into the finer details of doing this effectively, highlighting both pitfalls and payoffs. It aims to provide some lessons to others in advancing the state-of-the-art in CHS
    • …
    corecore