Search CORE

16,751 research outputs found

Porting Decision Tree Algorithms to Multicore using FastFlow

Author: A.C. Sodan
I. Park
J.E. Gehrke
J.R. Quinlan
K. Asanovic
M. Aldinucci
M. Cole
M. Coppola
M. Joshi
M. Vanneschi
M. Zaki
M.K. Sreenivas
R. Jin
R.D. Blumofe
S. Ruggieri
S. Ruggieri
T. Lim
W. Thies
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an approach for easy-yet-efficient porting of an implementation of the C4.5 algorithm on multicores. The parallel porting requires minimal changes to the original sequential code, and it is able to exploit up to 7X speedup on an Intel dual-quad core machine.Comment: 18 pages + cove

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

UnipiEprints

Incidence Geometries and the Pass Complexity of Semi-Streaming Set Cover

Author: Chakrabarti Amit
Wirth Anthony
Publication venue
Publication date: 16/07/2015
Field of study

Set cover, over a universe of size

n

, may be modelled as a data-streaming problem, where the

m

sets that comprise the instance are to be read one by one. A semi-streaming algorithm is allowed only

O(n\, \mathrm{poly}\{\log n, \log m\})

space to process this stream. For each

p \ge 1

, we give a very simple deterministic algorithm that makes

p

passes over the input stream and returns an appropriately certified

(p+1)n^{1/(p+1)}

-approximation to the optimum set cover. More importantly, we proceed to show that this approximation factor is essentially tight, by showing that a factor better than

0.99\,n^{1/(p+1)}/(p+1)^2

is unachievable for a

p

-pass semi-streaming algorithm, even allowing randomisation. In particular, this implies that achieving a

\Theta(\log n)

-approximation requires

\Omega(\log n/\log\log n)

passes, which is tight up to the

\log\log n

factor. These results extend to a relaxation of the set cover problem where we are allowed to leave an

\varepsilon

fraction of the universe uncovered: the tight bounds on the best approximation factor achievable in

p

passes turn out to be

\Theta_p(\min\{n^{1/(p+1)}, \varepsilon^{-1/p}\})

. Our lower bounds are based on a construction of a family of high-rank incidence geometries, which may be thought of as vast generalisations of affine planes. This construction, based on algebraic techniques, appears flexible enough to find other applications and is therefore interesting in its own right.Comment: 20 page

arXiv.org e-Print Archive

Crossref

Parallel Algorithms for Geometric Graph Problems

Author: Andoni Alexandr
Nikolov Aleksandar
Onak Krzysztof
Yaroslavtsev Grigory
Publication venue
Publication date: 01/01/2014
Field of study

We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a

(1+\epsilon)

-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem, despite drawing significant attention in recent years. We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time,

n^{1+o_\epsilon(1)}

. We note that while recently Sharathkumar and Agarwal developed a near-linear time algorithm for

(1+\epsilon)

-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in their work. Furthermore, our algorithm immediately gives a

(1+\epsilon)

-approximation algorithm with

n^{\delta}

space in the streaming-with-sorting model with

1/\delta^{O(1)}

passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem

arXiv.org e-Print Archive

CiteSeerX