Search CORE

7,844 research outputs found

Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search

Author: Andoni A.
Beyer K.
Broder A. Z.
Brown P. F.
Fried D.
Le Q.
Mikolov T.
Mu Y.
Muja M.
Petrović S.
Riezler S.
Salton G.
Wang J.
Weber R.
Yang L.
Yao X.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/10/2016
Field of study

Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline using an approximate k-NN search can be more effective and efficient than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines. Our software (including data-generating code) and derivative data based on the Stack Overflow collection is available online

arXiv.org e-Print Archive

Crossref

Scipedia

Deep integration of machine learning Into column stores

Author: Holanda P.T. (Pedro)
Manegold S. (Stefan)
Mühleisen H.F. (Hannes)
Raasveldt M. (Mark)
Publication venue
Publication date: 01/01/2018
Field of study

We leverage vectorized User-Defined Functions (UDFs) to efficiently integrate unchanged machine learning pipelines into an analytical data management system. The entire pipelines including data, models, parameters and evaluation outcomes are stored and executed inside the database system. Experiments using our MonetDB/Python UDFs show greatly improved performance due to reduced data movement and parallel processing opportunities. In addition, this integration enables meta-analysis of models using relational queries

CWI's Institutional Repository

Leiden University Scholary Publications

Big Data Pipelines on the Computing Continuum: Tapping the Dark Data

Author: Elvesæter Brian
Kharlamov Evgeny
Kimovski Dragi
Ledakis Giannis
Leotta Francesco
Marrella Andrea
Matskin Mihhail
Nikolov Nikolay
Prodan Radu
Roman Dumitru
Simonet-Boulogne Anthony
Song Hui
Soylu Ahmet
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

The computing continuum enables new opportunities for managing big data pipelines concerning efficient management of heterogeneous and untrustworthy resources. We discuss the big data pipelines lifecycle on the computing continuum and its associated challenges, and we outline a future research agenda in this area.acceptedVersio

SINTEF Open

Transitioning to an Integrated Renewable Energy System in the Dutch North Sea

Author: Andreasson Malin
van Nieuwkoop Lisa
Publication venue: 'University of Groningen Press'
Publication date: 11/10/2022
Field of study

Proceedings - University of Groningen

Transitioning to an Integrated Renewable Energy System in the Dutch North Sea

Author: Andreasson Malin
van Nieuwkoop Lisa
Publication venue: 'University of Groningen Press'
Publication date: 11/10/2022
Field of study

Dissertations of the University of Groningen