Search CORE

49 research outputs found

Split block Bloom filters

Author: Apple Jim
Publication venue
Publication date: 23/03/2021
Field of study

This short note describes a Bloom filter variant that takes advantage of modern SIMD instructions to increase speed by 30%-450%. This filter, the split block Bloom filter, is used by Apache Impala, Apache Kudu, Apache Parquet, and Apache Arrow.Comment: 3 pages, 1 figur

arXiv.org e-Print Archive

Swarm v3: towards tera-scale amplicon clustering

Author: Birol Inanc
Czech Lucas
de Vargas Colomban
Dunthorn Micah
Mahé Frédéric
Quince Christopher
Rognes Torbjørn
Stamatakis Alexandros
Publication venue: Oxford University Press
Publication date: 13/12/2022
Field of study

Motivation: Previously we presented swarm, an open-source amplicon clustering programme that produces fine-scale molecular operational taxonomic units (OTUs) that are free of arbitrary global clustering thresholds. Here, we present swarm v3 to address issues of contemporary datasets that are growing towards tera-byte sizes. Results: When compared with previous swarm versions, swarm v3 has modernized C++ source code, reduced memory footprint by up to 50%, optimized CPU-usage and multithreading (more than 7 times faster with default parameters), and it has been extensively tested for its robustness and logic

KITopen

Opportunistic linked data querying through approximate membership metadata

Author: BH Bloom
C Buil-Aranda
E Oren
G Aluç
I Ermilov
I Filali
M Schmachtenberg
R Gallager
R Verborgh
X Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Between URI dereferencing and the SPARQL protocol lies a largely unexplored axis of possible interfaces to Linked Data, each with its own combination of trade-offs. One of these interfaces is Triple Pattern Fragments, which allows clients to execute SPARQL queries against low-cost servers, at the cost of higher bandwidth. Increasing a client's efficiency means lowering the number of requests, which can among others be achieved through additional metadata in responses. We noted that typical SPARQL query evaluations against Triple Pattern Fragments require a significant portion of membership subqueries, which check the presence of a specific triple, rather than a variable pattern. This paper studies the impact of providing approximate membership functions, i.e., Bloom filters and Golomb-coded sets, as extra metadata. In addition to reducing HTTP requests, such functions allow to achieve full result recall earlier when temporarily allowing lower precision. Half of the tested queries from a WatDiv benchmark test set could be executed with up to a third fewer HTTP requests with only marginally higher server cost. Query times, however, did not improve, likely due to slower metadata generation and transfer. This indicates that approximate membership functions can partly improve the client-side query process with minimal impact on the server and its interface

Crossref

Ghent University Academic Bibliography

Optimizing approximate membership metadata in triple pattern fragments for clients and servers

Author: Taelman Ruben
Van Herwegen Joachim
Vander Sande Miel
Verborgh Ruben
Publication venue
Publication date: 01/01/2020
Field of study

Ghent University Academic Bibliography