16,751 research outputs found

    Porting Decision Tree Algorithms to Multicore using FastFlow

    Full text link
    The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an approach for easy-yet-efficient porting of an implementation of the C4.5 algorithm on multicores. The parallel porting requires minimal changes to the original sequential code, and it is able to exploit up to 7X speedup on an Intel dual-quad core machine.Comment: 18 pages + cove

    Incidence Geometries and the Pass Complexity of Semi-Streaming Set Cover

    Full text link
    Set cover, over a universe of size nn, may be modelled as a data-streaming problem, where the mm sets that comprise the instance are to be read one by one. A semi-streaming algorithm is allowed only O(npoly{logn,logm})O(n\, \mathrm{poly}\{\log n, \log m\}) space to process this stream. For each p1p \ge 1, we give a very simple deterministic algorithm that makes pp passes over the input stream and returns an appropriately certified (p+1)n1/(p+1)(p+1)n^{1/(p+1)}-approximation to the optimum set cover. More importantly, we proceed to show that this approximation factor is essentially tight, by showing that a factor better than 0.99n1/(p+1)/(p+1)20.99\,n^{1/(p+1)}/(p+1)^2 is unachievable for a pp-pass semi-streaming algorithm, even allowing randomisation. In particular, this implies that achieving a Θ(logn)\Theta(\log n)-approximation requires Ω(logn/loglogn)\Omega(\log n/\log\log n) passes, which is tight up to the loglogn\log\log n factor. These results extend to a relaxation of the set cover problem where we are allowed to leave an ε\varepsilon fraction of the universe uncovered: the tight bounds on the best approximation factor achievable in pp passes turn out to be Θp(min{n1/(p+1),ε1/p})\Theta_p(\min\{n^{1/(p+1)}, \varepsilon^{-1/p}\}). Our lower bounds are based on a construction of a family of high-rank incidence geometries, which may be thought of as vast generalisations of affine planes. This construction, based on algebraic techniques, appears flexible enough to find other applications and is therefore interesting in its own right.Comment: 20 page

    Parallel Algorithms for Geometric Graph Problems

    Full text link
    We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a (1+ϵ)(1+\epsilon)-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem, despite drawing significant attention in recent years. We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time, n1+oϵ(1)n^{1+o_\epsilon(1)}. We note that while recently Sharathkumar and Agarwal developed a near-linear time algorithm for (1+ϵ)(1+\epsilon)-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in their work. Furthermore, our algorithm immediately gives a (1+ϵ)(1+\epsilon)-approximation algorithm with nδn^{\delta} space in the streaming-with-sorting model with 1/δO(1)1/\delta^{O(1)} passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem
    corecore