217 research outputs found

    Search Agent Model: a Conceptual Framework for Search by Algorithms and Agent Systems

    Get PDF
    No abstract available

    Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture

    Full text link
    We present the architecture behind Twitter's real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time "twist": after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of real-time data processing in the era of "big data". We tell the story of how our system was built twice: our first implementation was built on a typical Hadoop-based analytics stack, but was later replaced because it did not meet the latency requirements necessary to generate meaningful real-time results. The second implementation, which is the system deployed in production, is a custom in-memory processing engine specifically designed for the task. This experience taught us that the current typical usage of Hadoop as a "big data" platform, while great for experimentation, is not well suited to low-latency processing, and points the way to future work on data analytics platforms that can handle "big" as well as "fast" data

    Search Agent Model: a Conceptual Framework for Search by Algorithms and Agent Systems

    Get PDF
    No abstract available

    Review of Erosion and Sedimentation Control Programs in the Piscataqua Region

    Get PDF
    The Piscataqua Region Estuaries Partnership (PREP) seeks to minimize adverse impacts to water resources associated with construction site development activities. In order to achieve this goal, PREP must understand the strengths and weaknesses of existing erosion and sedimentation control (E&SC) programs in the 52 municipalities of the PREP water shed (Figure 1-1). A detailed understanding of the existing E&SC programs will enable PREP and other stakeholders to identify and implement actions to improve E&SC programs and minimize adverse impacts. This report provides a review and assessment of existing erosion and sedimentation control programs and a set of recommendations for improving these programs. Our approach in conducting the review was to obtain available federal, state and municipal programs data and to interview people who work with E&SC programs on a daily basis, including state, municipal, construction contractor and site inspector staff. A statement of the problem, an introduction to applicable regulations, and a description of our project approach are provided below

    Generative and Pseudo-Relevant Feedback for Sparse, Dense and Learned Sparse Retrieval

    Get PDF
    Pseudo-relevance feedback (PRF) is a classical approach to address lexical mismatch by enriching the query using first-pass retrieval. Moreover, recent work on generative-relevance feedback (GRF) shows that query expansion models using text generated from large language models can improve sparse retrieval without depending on first-pass retrieval effectiveness. This work extends GRF to dense and learned sparse retrieval paradigms with experiments over six standard document ranking benchmarks. We find that GRF improves over comparable PRF techniques by around 10% on both precision and recall-oriented measures. Nonetheless, query analysis shows that GRF and PRF have contrasting benefits, with GRF providing external context not present in first-pass retrieval, whereas PRF grounds the query to the information contained within the target corpus. Thus, we propose combining generative and pseudo-relevance feedback ranking signals to achieve the benefits of both feedback classes, which significantly increases recall over PRF methods on 95% of experiments

    DREQ: Document Re-Ranking Using Entity-based Query Understanding

    Full text link
    While entity-oriented neural IR models have advanced significantly, they often overlook a key nuance: the varying degrees of influence individual entities within a document have on its overall relevance. Addressing this gap, we present DREQ, an entity-oriented dense document re-ranking model. Uniquely, we emphasize the query-relevant entities within a document's representation while simultaneously attenuating the less relevant ones, thus obtaining a query-specific entity-centric document representation. We then combine this entity-centric document representation with the text-centric representation of the document to obtain a "hybrid" representation of the document. We learn a relevance score for the document using this hybrid representation. Using four large-scale benchmarks, we show that DREQ outperforms state-of-the-art neural and non-neural re-ranking methods, highlighting the effectiveness of our entity-oriented representation approach.Comment: To be presented as a full paper at ECIR 2024 in Glasgpow, U

    DREQ: Document Re-Ranking Using Entity-based Query Understanding

    Get PDF
    While entity-oriented neural IR models have advanced significantly, they often overlook a key nuance: the varying degrees of influence individual entities within a document have on its overall relevance. Addressing this gap, we present DREQ, an entity-oriented dense document re-ranking model. Uniquely, we emphasize the query-relevant entities within a document’s representation while simultaneously attenuating the less relevant ones, thus obtaining a query-specific entity-centric document representation. We then combine this entity-centric document representation with the text-centric representation of the document to obtain a “hybrid” representation of the document. We learn a relevance score for the document using this hybrid representation. Using four largescale benchmarks, we show that DREQ outperforms state-of-the-art neural and non-neural re-ranking methods, highlighting the effectiveness of our entity-oriented representation approach

    Local and global query expansion for hierarchical complex topics

    Get PDF
    In this work we study local and global methods for query expansion for multifaceted complex topics. We study word-based and entity-based expansion methods and extend these approaches to complex topics using fine-grained expansion on different elements of the hierarchical query structure. For a source of hierarchical complex topics we use the TREC Complex Answer Retrieval (CAR) benchmark data collection. We find that leveraging the hierarchical topic structure is needed for both local and global expansion methods to be effective. Further, the results demonstrate that entity-based expansion methods show significant gains over word-based models alone, with local feedback providing the largest improvement. The results on the CAR paragraph retrieval task demonstrate that expansion models that incorporate both the hierarchical query structure and entity-based expansion result in a greater than 20% improvement over word-based expansion approaches
    • 

    corecore