Search CORE

8,556 research outputs found

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

Author: Blelloch G. E.
Blelloch G. E.
Cormen T. H.
Da Zheng D. M.
Dasari N. S.
Gonzalez J. E.
Greenlaw R.
Karp R. M.
Low Y.
Maon Y.
Ramachandran V.
Shiloach Y.
Zhou W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/07/2019
Field of study

There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 20 important graph problems. We also present the optimizations and techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We have made the implementations developed in this work publicly-available as the Graph-Based Benchmark Suite (GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 201

arXiv.org e-Print Archive

Crossref

DSpace@MIT

New Approximability Results for the Robust k-Median Problem

Author: C. Lund
J.H. Lin
M. Charikar
N. Bansal
R. Raz
S. Arora
S. Arora
S.G. Kolliopoulos
T. Hagerup
U. Feige
V. Arya
Publication venue
Publication date: 01/01/2013
Field of study

We consider a robust variant of the classical

k

-median problem, introduced by Anthony et al. \cite{AnthonyGGN10}. In the \emph{Robust

k

-Median problem}, we are given an

n

-vertex metric space

(V,d)

and

m

client sets

\set{S_i \subseteq V}_{i=1}^m

. The objective is to open a set

F \subseteq V

k

facilities such that the worst case connection cost over all client sets is minimized; in other words, minimize

\max_{i} \sum_{v \in S_i} d(F,v)

. Anthony et al.\ showed an

O(\log m)

approximation algorithm for any metric and APX-hardness even in the case of uniform metric. In this paper, we show that their algorithm is nearly tight by providing

\Omega(\log m/ \log \log m)

approximation hardness, unless

{\sf NP} \subseteq \bigcap_{\delta >0} {\sf DTIME}(2^{n^{\delta}})

. This hardness result holds even for uniform and line metrics. To our knowledge, this is one of the rare cases in which a problem on a line metric is hard to approximate to within logarithmic factor. We complement the hardness result by an experimental evaluation of different heuristics that shows that very simple heuristics achieve good approximations for realistic classes of instances.Comment: 19 page

arXiv.org e-Print Archive

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

A New Framework for Distributed Submodular Maximization

Author: Barbosa Rafael da Ponte
Ene Alina
Nguyen Huy L.
Ward Justin
Publication venue
Publication date: 11/08/2016
Field of study

A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. A lot of recent effort has been devoted to developing distributed algorithms for these problems. However, these results suffer from high number of rounds, suboptimal approximation ratios, or both. We develop a framework for bringing existing algorithms in the sequential setting to the distributed setting, achieving near optimal approximation ratios for many settings in only a constant number of MapReduce rounds. Our techniques also give a fast sequential algorithm for non-monotone maximization subject to a matroid constraint

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Incidence Geometries and the Pass Complexity of Semi-Streaming Set Cover

Author: Chakrabarti Amit
Wirth Anthony
Publication venue
Publication date: 16/07/2015
Field of study

Set cover, over a universe of size

n

, may be modelled as a data-streaming problem, where the

m

sets that comprise the instance are to be read one by one. A semi-streaming algorithm is allowed only

O(n\, \mathrm{poly}\{\log n, \log m\})

space to process this stream. For each

p \ge 1

, we give a very simple deterministic algorithm that makes

p

passes over the input stream and returns an appropriately certified

(p+1)n^{1/(p+1)}

-approximation to the optimum set cover. More importantly, we proceed to show that this approximation factor is essentially tight, by showing that a factor better than

0.99\,n^{1/(p+1)}/(p+1)^2

is unachievable for a

p

-pass semi-streaming algorithm, even allowing randomisation. In particular, this implies that achieving a

\Theta(\log n)

-approximation requires

\Omega(\log n/\log\log n)

passes, which is tight up to the

\log\log n

factor. These results extend to a relaxation of the set cover problem where we are allowed to leave an

\varepsilon

fraction of the universe uncovered: the tight bounds on the best approximation factor achievable in

p

passes turn out to be

\Theta_p(\min\{n^{1/(p+1)}, \varepsilon^{-1/p}\})

. Our lower bounds are based on a construction of a family of high-rank incidence geometries, which may be thought of as vast generalisations of affine planes. This construction, based on algebraic techniques, appears flexible enough to find other applications and is therefore interesting in its own right.Comment: 20 page

arXiv.org e-Print Archive

Crossref

TreeGrad: Transferring Tree Ensembles to Neural Networks

Author: C Siu
DH Wolpert
F Pedregosa
JA Blackard
JH Friedman
K Nakai
L Breiman
SK Murthy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/12/2019
Field of study

Gradient Boosting Decision Tree (GBDT) are popular machine learning algorithms with implementations such as LightGBM and in popular machine learning toolkits like Scikit-Learn. Many implementations can only produce trees in an offline manner and in a greedy manner. We explore ways to convert existing GBDT implementations to known neural network architectures with minimal performance loss in order to allow decision splits to be updated in an online manner and provide extensions to allow splits points to be altered as a neural architecture search problem. We provide learning bounds for our neural network.Comment: Technical Report on Implementation of Deep Neural Decision Forests Algorithm. To accompany implementation here: https://github.com/chappers/TreeGrad. Update: Please cite as: Siu, C. (2019). "Transferring Tree Ensembles to Neural Networks". International Conference on Neural Information Processing. Springer, 2019. arXiv admin note: text overlap with arXiv:1909.1179

arXiv.org e-Print Archive

Crossref