51,455 research outputs found

    Looking at Vector Space and Language Models for IR using Density Matrices

    Full text link
    In this work, we conduct a joint analysis of both Vector Space and Language Models for IR using the mathematical framework of Quantum Theory. We shed light on how both models allocate the space of density matrices. A density matrix is shown to be a general representational tool capable of leveraging capabilities of both VSM and LM representations thus paving the way for a new generation of retrieval models. We analyze the possible implications suggested by our findings.Comment: In Proceedings of Quantum Interaction 201

    A survey on the use of relevance feedback for information access systems

    Get PDF
    Users of online search engines often find it difficult to express their need for information in the form of a query. However, if the user can identify examples of the kind of documents they require then they can employ a technique known as relevance feedback. Relevance feedback covers a range of techniques intended to improve a user's query and facilitate retrieval of information relevant to a user's information need. In this paper we survey relevance feedback techniques. We study both automatic techniques, in which the system modifies the user's query, and interactive techniques, in which the user has control over query modification. We also consider specific interfaces to relevance feedback systems and characteristics of searchers that can affect the use and success of relevance feedback systems

    Exploiting semantics for improving clinical information retrieval

    Get PDF
    Clinical information retrieval (IR) presents several challenges including terminology mismatch and granularity mismatch. One of the main objectives in clinical IR is to fill the semantic gap among the queries and documents and going beyond keywords matching. To address these issues, in this study we attempt to use semantic information to improve the performance of clinical IR systems by representing queries in an expressive and meaningful context. In this study we propose query context modeling to improve the effectiveness of clinical IR systems. To model query contexts we propose two novel approaches to modeling medical query contexts. The first approach concerns modeling medical query contexts based on mining semantic-based AR for improving clinical text retrieval. The query context is derived from the rules that cover the query and then weighted according to their semantic relatedness to the query concepts. In our second approach we model a representative query context by developing query domain ontology. To develop query domain ontology we extract all the concepts that have semantic relationship with the query concept(s) in UMLS ontologies. Query context represents concepts extracted from query domain ontology and weighted according to their semantic relatedness to the query concept(s). The query context is then exploited in the patient records query expansion and re-ranking for improving clinical retrieval performance. We evaluate this approach on the TREC Medical Records dataset. Results show that our proposed approach significantly improves the retrieval performance compare to classic keyword-based IR model

    Trust and Risk Relationship Analysis on a Workflow Basis: A Use Case

    Get PDF
    Trust and risk are often seen in proportion to each other; as such, high trust may induce low risk and vice versa. However, recent research argues that trust and risk relationship is implicit rather than proportional. Considering that trust and risk are implicit, this paper proposes for the first time a novel approach to view trust and risk on a basis of a W3C PROV provenance data model applied in a healthcare domain. We argue that high trust in healthcare domain can be placed in data despite of its high risk, and low trust data can have low risk depending on data quality attributes and its provenance. This is demonstrated by our trust and risk models applied to the BII case study data. The proposed theoretical approach first calculates risk values at each workflow step considering PROV concepts and second, aggregates the final risk score for the whole provenance chain. Different from risk model, trust of a workflow is derived by applying DS/AHP method. The results prove our assumption that trust and risk relationship is implicit

    Entity Query Feature Expansion Using Knowledge Base Links

    Get PDF
    Recent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the Google Knowledge Graph. Understanding how to leverage these entity annotations of text to improve ad hoc document retrieval is an open research area. Query expansion is a commonly used technique to improve retrieval effectiveness. Most previous query expansion approaches focus on text, mainly using unigram concepts. In this paper, we propose a new technique, called entity query feature expansion (EQFE) which enriches the query with features from entities and their links to knowledge bases, including structured attributes and text. We experiment using both explicit query entity annotations and latent entities. We evaluate our technique on TREC text collections automatically annotated with knowledge base entity links, including the Google Freebase Annotations (FACC1) data. We find that entity-based feature expansion results in significant improvements in retrieval effectiveness over state-of-the-art text expansion approaches

    What makes ISAF s/tick: An investigation of the politics of coalition burden-sharing

    Get PDF
    This paper is interested in conceptualising the often raised issue of over- and under-contributing in coalition operations; that of how and why members of complex coalitions2 may be punching above and below their weight, respectively. To this end, the first section presents a parsimonious baseline assumption regarding what variables may fundamentally inform coalition burden-sharing, to subsequently discuss how much each of these are found to play a role in the Afghanistan context. The second section elaborates on this by assessing the perception and the interpretation of threats by coalition member countries, related to Afghanistan, as this pertains to prioritising other variables within the scheme outlined in the previous section. The third and fourth sections then proceed to examine and further enrich the existing literature on coalition burden-sharing, and provide further insights regarding the operations of the International Security Assistance Force–Afghanistan, and regarding ISAF member-country decisionmaking; the objective here is to generate further refined assumptions, that can permit a preliminary assessment of the phenomenon of uneven burden-sharing in ISAF, complementing the initial baseline expectations

    Adapting Learned Sparse Retrieval for Long Documents

    Full text link
    Learned sparse retrieval (LSR) is a family of neural retrieval methods that transform queries and documents into sparse weight vectors aligned with a vocabulary. While LSR approaches like Splade work well for short passages, it is unclear how well they handle longer documents. We investigate existing aggregation approaches for adapting LSR to longer documents and find that proximal scoring is crucial for LSR to handle long documents. To leverage this property, we proposed two adaptations of the Sequential Dependence Model (SDM) to LSR: ExactSDM and SoftSDM. ExactSDM assumes only exact query term dependence, while SoftSDM uses potential functions that model the dependence of query terms and their expansion terms (i.e., terms identified using a transformer's masked language modeling head). Experiments on the MSMARCO Document and TREC Robust04 datasets demonstrate that both ExactSDM and SoftSDM outperform existing LSR aggregation approaches for different document length constraints. Surprisingly, SoftSDM does not provide any performance benefits over ExactSDM. This suggests that soft proximity matching is not necessary for modeling term dependence in LSR. Overall, this study provides insights into handling long documents with LSR, proposing adaptations that improve its performance.Comment: SIGIR 202
    • 

    corecore