217 research outputs found
Search Agent Model: a Conceptual Framework for Search by Algorithms and Agent Systems
No abstract available
Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture
We present the architecture behind Twitter's real-time related query
suggestion and spelling correction service. Although these tasks have received
much attention in the web search literature, the Twitter context introduces a
real-time "twist": after significant breaking news events, we aim to provide
relevant results within minutes. This paper provides a case study illustrating
the challenges of real-time data processing in the era of "big data". We tell
the story of how our system was built twice: our first implementation was built
on a typical Hadoop-based analytics stack, but was later replaced because it
did not meet the latency requirements necessary to generate meaningful
real-time results. The second implementation, which is the system deployed in
production, is a custom in-memory processing engine specifically designed for
the task. This experience taught us that the current typical usage of Hadoop as
a "big data" platform, while great for experimentation, is not well suited to
low-latency processing, and points the way to future work on data analytics
platforms that can handle "big" as well as "fast" data
Search Agent Model: a Conceptual Framework for Search by Algorithms and Agent Systems
No abstract available
Review of Erosion and Sedimentation Control Programs in the Piscataqua Region
The Piscataqua Region Estuaries Partnership (PREP) seeks to minimize adverse impacts to water resources associated with construction site development activities. In order to achieve this goal, PREP must understand the strengths and weaknesses of existing erosion and sedimentation control (E&SC) programs in the 52 municipalities of the PREP water shed (Figure 1-1). A detailed understanding of the existing E&SC programs will enable PREP and other stakeholders to identify and implement actions to improve E&SC programs and minimize adverse impacts. This report provides a review and assessment of existing erosion and sedimentation control programs and a set of recommendations for improving these programs. Our approach in conducting the review was to obtain available federal, state and municipal programs data and to interview people who work with E&SC programs on a daily basis, including state, municipal, construction contractor and site inspector staff. A statement of the problem, an introduction to applicable regulations, and a description of our project approach are provided below
Generative and Pseudo-Relevant Feedback for Sparse, Dense and Learned Sparse Retrieval
Pseudo-relevance feedback (PRF) is a classical approach to address lexical mismatch by enriching the query using first-pass retrieval. Moreover, recent work on generative-relevance feedback (GRF) shows that query expansion models using text generated from large language models can improve sparse retrieval without depending on first-pass retrieval effectiveness. This work extends GRF to dense and learned sparse retrieval paradigms with experiments over six standard document ranking benchmarks. We find that GRF improves over comparable PRF techniques by around 10% on both precision and recall-oriented measures. Nonetheless, query analysis shows that GRF and PRF have contrasting benefits, with GRF providing external context not present in first-pass retrieval, whereas PRF grounds the query to the information contained within the target corpus. Thus, we propose combining generative and pseudo-relevance feedback ranking signals to achieve the benefits of both feedback classes, which significantly increases recall over PRF methods on 95% of experiments
DREQ: Document Re-Ranking Using Entity-based Query Understanding
While entity-oriented neural IR models have advanced significantly, they
often overlook a key nuance: the varying degrees of influence individual
entities within a document have on its overall relevance. Addressing this gap,
we present DREQ, an entity-oriented dense document re-ranking model. Uniquely,
we emphasize the query-relevant entities within a document's representation
while simultaneously attenuating the less relevant ones, thus obtaining a
query-specific entity-centric document representation. We then combine this
entity-centric document representation with the text-centric representation of
the document to obtain a "hybrid" representation of the document. We learn a
relevance score for the document using this hybrid representation. Using four
large-scale benchmarks, we show that DREQ outperforms state-of-the-art neural
and non-neural re-ranking methods, highlighting the effectiveness of our
entity-oriented representation approach.Comment: To be presented as a full paper at ECIR 2024 in Glasgpow, U
DREQ: Document Re-Ranking Using Entity-based Query Understanding
While entity-oriented neural IR models have advanced significantly, they often overlook a key nuance: the varying degrees of influence individual entities within a document have on its overall relevance. Addressing this gap, we present DREQ, an entity-oriented dense document re-ranking model. Uniquely, we emphasize the query-relevant entities within a documentâs representation while simultaneously attenuating the less relevant ones, thus obtaining a query-specific entity-centric document representation. We then combine this entity-centric document representation with the text-centric representation of the document to obtain a âhybridâ representation of the document. We learn a relevance score for the document using this hybrid representation. Using four largescale benchmarks, we show that DREQ outperforms state-of-the-art neural and non-neural re-ranking methods, highlighting the effectiveness of our entity-oriented representation approach
Local and global query expansion for hierarchical complex topics
In this work we study local and global methods for query expansion for multifaceted complex topics. We study word-based and entity-based expansion methods and extend these approaches to complex topics using fine-grained expansion on different elements of the hierarchical query structure. For a source of hierarchical complex topics we use the TREC Complex Answer Retrieval (CAR) benchmark data collection. We find that leveraging the hierarchical topic structure is needed for both local and global expansion methods to be effective. Further, the results demonstrate that entity-based expansion methods show significant gains over word-based models alone, with local feedback providing the largest improvement. The results on the CAR paragraph retrieval task demonstrate that expansion models that incorporate both the hierarchical query structure and entity-based expansion result in a greater than 20% improvement over word-based expansion approaches
- âŠ