Search CORE

7 research outputs found

Better bitmap performance with Roaring bitmaps

Author: Beyer
Colantonio
Culpepper
Fusco
Inoue
Kaser
Kaser
Lemire
Lemire
Lemire
Warren
Publication venue: 'Wiley'
Publication date: 15/03/2016
Field of study

Bitmap indexes are commonly used in databases and search engines. By exploiting bit-level parallelism, they can significantly accelerate queries. However, they can use much memory, and thus we might prefer compressed bitmap indexes. Following Oracle's lead, bitmaps are often compressed using run-length encoding (RLE). Building on prior work, we introduce the Roaring compressed bitmap format: it uses packed arrays for compression instead of RLE. We compare it to two high-performance RLE-based bitmap encoding techniques: WAH (Word Aligned Hybrid compression scheme) and Concise (Compressed `n' Composable Integer Set). On synthetic and real data, we find that Roaring bitmaps (1) often compress significantly better (e.g., 2 times) and (2) are faster than the compressed alternatives (up to 900 times faster for intersections). Our results challenge the view that RLE-based bitmap compression is best

arXiv.org e-Print Archive

CiteSeerX

R-libre

Crossref

Towards an Objective Metric for the Performance of Exact Triangle Count

Author: Blanco Mark P.
Low Tze Meng
McMillan Scott
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/09/2021
Field of study

The performance of graph algorithms is often measured in terms of the number of traversed edges per second (TEPS). However, this performance metric is inadequate for a graph operation such as exact triangle counting. In triangle counting, execution times on graphs with a similar number of edges can be distinctly different as demonstrated by results from the past Graph Challenge entries. We discuss the need for an objective performance metric for graph operations and the desired characteristics of such a metric such that it more accurately captures the interactions between the amount of work performed and the capabilities of the hardware on which the code is executed. Using exact triangle counting as an example, we derive a metric that captures how certain techniques employed in many implementations improve performance. We demonstrate that our proposed metric can be used to evaluate and compare multiple approaches for triangle counting, using a SIMD approach as a case study against a scalar baseline.Comment: 6 Pages, 2020 IEEE High Performance Extreme Computing Conference(HPEC

arXiv.org e-Print Archive

Crossref

Faster set intersection with SIMD instructions by reducing branch mispredictions

Author: Demaine E. D.
R.
Schlegel B.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

SIMD- and cache-friendly algorithm for sorting an array of structures

Author: Gedik B.
Knuth D. E.
Lacey S.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

Efficient External-Memory Algorithms for Graph Mining

Author: Cui Yi
Publication venue
Publication date: 16/01/2019
Field of study

The explosion of big data in areas like the web and social networks has posed big challenges to research activities, including data mining, information retrieval, security etc. This dissertation focuses on a particular area, graph mining, and specifically proposes several novel algorithms to solve the problems of triangle listing and computation of neighborhood function in large-scale graphs. We first study the classic problem of triangle listing. We generalize the existing in-memory algorithms into a single framework of 18 triangle-search techniques. We then develop a novel external-memory approach, which we call Pruned Companion Files (PCF), that supports disk operation of all 18 algorithms. When compared to state-of-the-art available implementations MGT and PDTL, PCF runs 5-10 times faster and exhibits orders of magnitude less I/O. We next focus on I/O complexity of triangle listing. Recent work by Pagh etc. provides an appealing theoretical I/O complexity for triangle listing via graph partitioning by random coloring of nodes. Since no implementation of Pagh is available and little is known about the comparison between Pagh and PCF, we carefully implement Pagh, undertake an investigation into the properties of these algorithms, model their I/O cost, understand their shortcomings, and shed light on the conditions under which each method defeats the other. This insight leads us to develop a novel framework we call Trigon that surpasses the I/O performance of both techniques in all graphs and under all RAM conditions. We finally turn our attention to neighborhood function. Exact computation of neighborhood function is expensive in terms of CPU and I/O cost. Previous work mostly focuses on approximations. We show that our novel techniques developed for triangle listing can also be applied to this problem. We next study an application of neighborhood function to ranking of Internet hosts. Our method computes neighborhood functions for each host as an indication of its reputation. The evaluation shows that our method is robust to ranking manipulation and brings less spam to its top ranking list compared to PageRank and TrustRank

Texas A&M Repository

Efficient External-Memory Algorithms for Graph Mining

Author: Cui Yi
Publication venue
Publication date: 16/01/2019
Field of study

Texas A&M Repository