7,844 research outputs found
Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search
Retrieval pipelines commonly rely on a term-based search to obtain candidate
records, which are subsequently re-ranked. Some candidates are missed by this
approach, e.g., due to a vocabulary mismatch. We address this issue by
replacing the term-based search with a generic k-NN retrieval algorithm, where
a similarity function can take into account subtle term associations. While an
exact brute-force k-NN search using this similarity function is slow, we
demonstrate that an approximate algorithm can be nearly two orders of magnitude
faster at the expense of only a small loss in accuracy. A retrieval pipeline
using an approximate k-NN search can be more effective and efficient than the
term-based pipeline. This opens up new possibilities for designing effective
retrieval pipelines. Our software (including data-generating code) and
derivative data based on the Stack Overflow collection is available online
Deep integration of machine learning Into column stores
We leverage vectorized User-Defined Functions (UDFs) to efficiently integrate unchanged machine learning pipelines into an analytical data management system. The entire pipelines including data, models, parameters and evaluation outcomes are stored and executed inside the database system. Experiments using our MonetDB/Python UDFs show greatly improved performance due to reduced data movement and parallel processing opportunities. In addition, this integration enables meta-analysis of models using relational queries
Big Data Pipelines on the Computing Continuum: Tapping the Dark Data
The computing continuum enables new opportunities for managing big data pipelines concerning efficient management of heterogeneous and untrustworthy resources. We discuss the big data pipelines lifecycle on the computing continuum and its associated challenges, and we outline a future research agenda in this area.acceptedVersio
- …