Search CORE

19 research outputs found

A New Data Layout For Set Intersection on GPUs

Author: Amossen Rasmus Resen
Pagh Rasmus
Publication venue
Publication date: 01/01/2011
Field of study

Set intersection is the core in a variety of problems, e.g. frequent itemset mining and sparse boolean matrix multiplication. It is well-known that large speed gains can, for some computational problems, be obtained by using a graphics processing unit (GPU) as a massively parallel computing device. However, GPUs require highly regular control flow and memory access patterns, and for this reason previous GPU methods for intersecting sets have used a simple bitmap representation. This representation requires excessive space on sparse data sets. In this paper we present a novel data layout, "BatMap", that is particularly well suited for parallel processing, and is compact even for sparse data. Frequent itemset mining is one of the most important applications of set intersection. As a case-study on the potential of BatMaps we focus on frequent pair mining, which is a core special case of frequent itemset mining. The main finding is that our method is able to achieve speedups over both Apriori and FP-growth when the number of distinct items is large, and the density of the problem instance is above 1%. Previous implementations of frequent itemset mining on GPU have not been able to show speedups over the best single-threaded implementations.Comment: A version of this paper appears in Proceedings of IPDPS 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

The IT University of Copenhagen's Repository

From time representation in scheduling to the solution of strip packing problems

Author: Alvarez-Valdes
Alvarez-Valdes
Amossen
Bekrar
Capón-Garcia
Castillo
Castro
Castro
Castro
Castro
Castro
Dimitriadis
Dolatabadi
Grancolas
Hifi
Ignacio E. Grossmann
Kenmochi
Kondili
Li
Maravelias
Martello
Méndez
Méndez
Méndez
Ortmann
Pantelides
Pedro M. Castro
Sawaya
Seid
Sundaramoorthy
Wascher
Wassick
Wassick
Wei
Westerlund
Wu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Faster Join-Projects and Sparse Matrix Multiplications

Author: Amossen
Rasmus Pagh
Resen Rasmus
Publication venue
Publication date: 11/04/2020
Field of study

ABSTRACT Computing an equi-join followed by a duplicate eliminating projection is conventionally done by performing the two operations in serial. If some join attribute is projected away the intermediate result may be much larger than both the input and the output, and the computation could therefore potentially be performed faster by a direct procedure that does not produce such a large intermediate result. We present a new algorithm that has smaller intermediate results on worst-case inputs, and in particular is more efficient in both the RAM and I/O model. It is easy to see that join-project where the join attributes are projected away is equivalent to boolean matrix multiplication. Our results can therefore also be interpreted as improved sparse, output-sensitive matrix multiplication

CiteSeerX

Scalable Query Evaluation in Relational Databases

Author: Amossen Rasmus Resen
Publication venue: IT-Universitetet i København
Publication date: 01/01/2011
Field of study

The IT University of Copenhagen's Repository

Better Size Estimation for Sparse Matrix Products

Author: Amossen Rasmus Resen
Campagna Andrea
Pagh Rasmus
Publication venue
Publication date: 01/07/2013
Field of study

Abstract. We consider the problem of doing fast and reliable estimation of the number z of non-zero entries in a sparse boolean matrix product. This problem has applications in databases and computer algebra. Let n denote the total number of non-zero entries in the input matrices. We show how to compute a 1 ± ε approximation of z (with small probability of error) in expected time O(n) for any ε> 4 / 4 √ z. The previously best estimation algorithm, due to Cohen (JCSS 1997), uses time O(n/ε 2). We also present a variant using O(sort(n)) I/Os in expectation in the cache-oblivious model. In contrast to these results, the currently best algorithms for computing a sparse boolean matrix product use time ω(n 4/3) (resp. ω(n 4/3 /B) I/Os), even if the result matrix has only z = O(n) nonzero entries. Our algorithm combines the size estimation technique of Bar-Yossef et al. (RANDOM 2002) with a particular class of pairwise independent hash functions that allows the sketch of a set of the form A×C to be computed in expected time O(|A | + |C|) and O(sort(|A | + |C|)) I/Os. We then describe how sampling can be used to maintain (independent) sketches of matrices that allow estimation to be performed in time o(n) if z is sufficiently large. This gives a simpler alternative to the sketching technique of Ganguly et al. (PODS 2005), and matches a space lower bound shown in that paper. Finally, we present experiments on real-world data sets that show the accuracy of both our methods to be significantly better than the worstcase analysis predicts.

CiteSeerX

The IT University of Copenhagen's Repository

Faster join-projects and sparse matrix multiplications

Author: Amossen Rasmus Resen
Pagh Rasmus
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Computing an equi-join followed by a duplicate eliminating projection is conventionally done by performing the two operations in serial. If some join attribute is projected away the intermediate result may be much larger than both the input and the output, and the computation could therefore potentially be performed faster by a direct procedure that does not produce such a large intermediate result. We present a new algorithm that has smaller intermediate results on worst-case inputs, and in particular is more efficient in both the RAM and I/O model. It is easy to see that join-project where the join attributes are projected away is equivalent to boolean matrix multiplication. Our results can therefore also be interpreted as improved sparse, output-sensitive matrix multiplication

CiteSeerX

The IT University of Copenhagen's Repository

Better Size Estimation for Sparse Matrix Products

Author: Amossen Rasmus Resen
Campagna Andrea
Pagh Rasmus
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2010
Field of study

Abstract. We consider the problem of doing fast and reliable estimation of the number of non-zero entries in a sparse boolean matrix product. Let n denote the total number of non-zero entries in the input matrices. We show how to compute a 1 ± ε approximation (with small probability of error) in expected time O(n) for any ε> 4 / 4 √ n. The previously best estimation algorithm, due to Cohen (JCSS 1997), uses time O(n/ε 2). We also present a variant using O(sort(n)) I/Os in expectation in the cache-oblivious model. We also describe how sampling can be used to maintain (independent) sketches of matrices that allow estimation to be performed in time o(n) if z is sufficiently large. This gives a simpler alternative to the sketching technique of Ganguly et al. (PODS 2005), and matches a space lower bound shown in that paper.

arXiv.org e-Print Archive

CiteSeerX

The IT University of Copenhagen's Repository