Search CORE

17,191 research outputs found

Tree Contraction, Connected Components, Minimum Spanning Trees: a GPU Path to Vertex Fitting

Author: Hobson PR
Lopes RHC
Reid ID
Publication venue: Verlag Deutsches Elektronen-Synchrotron
Publication date: 01/01/2014
Field of study

Standard parallel computing operations are considered in the context of algorithms for solving 3D graph problems which have applications, e.g., in vertex finding in HEP. Exploiting GPUs for tree-accumulation and graph algorithms is challenging: GPUs offer extreme computational power and high memory-access bandwidth, combined with a model of fine-grained parallelism perhaps not suiting the irregular distribution of linked representations of graph data structures. Achieving data-race free computations may demand serialization through atomic transactions, inevitably producing poor parallel performance. A Minimum Spanning Tree algorithm for GPUs is presented, its implementation discussed, and its efficiency evaluated on GPU and multicore architectures

DESY Publication Database

DESY

Brunel University Research Archive

A Faster Distributed Single-Source Shortest Paths Algorithm

Author: Forster Sebastian
Nanongkai Danupon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/07/2019
Field of study

We devise new algorithms for the single-source shortest paths (SSSP) problem with non-negative edge weights in the CONGEST model of distributed computing. While close-to-optimal solutions, in terms of the number of rounds spent by the algorithm, have recently been developed for computing SSSP approximately, the fastest known exact algorithms are still far away from matching the lower bound of

\tilde \Omega (\sqrt{n} + D)

rounds by Peleg and Rubinovich [SIAM Journal on Computing 2000], where

n

is the number of nodes in the network and

D

is its diameter. The state of the art is Elkin's randomized algorithm [STOC 2017] that performs

\tilde O(n^{2/3} D^{1/3} + n^{5/6})

rounds. We significantly improve upon this upper bound with our two new randomized algorithms for polynomially bounded integer edge weights, the first performing

\tilde O (\sqrt{n D})

rounds and the second performing

\tilde O (\sqrt{n} D^{1/4} + n^{3/5} + D)

rounds. Our bounds also compare favorably to the independent result by Ghaffari and Li [STOC 2018]. As side results, we obtain a

(1 + \epsilon)

-approximation

\tilde O ((\sqrt{n} D^{1/4} + D) / \epsilon)

-round algorithm for directed SSSP and a new work/depth trade-off for exact SSSP on directed graphs in the PRAM model.Comment: Presented at the the 59th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2018

arXiv.org e-Print Archive

Crossref

Near-Optimal Approximate Shortest Paths and Transshipment in Distributed and Streaming Models

Author: Becker Ruben
Forster Sebastian
Karrenbauer Andreas
Lenzen Christoph
Publication venue
Publication date: 01/01/2021
Field of study

We present a method for solving the transshipment problem - also known as uncapacitated minimum cost flow - up to a multiplicative error of

1 + \varepsilon

in undirected graphs with non-negative edge weights using a tailored gradient descent algorithm. Using

\tilde{O}(\cdot)

to hide polylogarithmic factors in

n

(the number of nodes in the graph), our gradient descent algorithm takes

\tilde O(\varepsilon^{-2})

iterations, and in each iteration it solves an instance of the transshipment problem up to a multiplicative error of

\operatorname{polylog} n

. In particular, this allows us to perform a single iteration by computing a solution on a sparse spanner of logarithmic stretch. Using a randomized rounding scheme, we can further extend the method to finding approximate solutions for the single-source shortest paths (SSSP) problem. As a consequence, we improve upon prior work by obtaining the following results: (1) Broadcast CONGEST model:

(1 + \varepsilon)

-approximate SSSP using

\tilde{O}((\sqrt{n} + D)\varepsilon^{-3})

rounds, where

D

is the (hop) diameter of the network. (2) Broadcast congested clique model:

(1 + \varepsilon)

-approximate transshipment and SSSP using

\tilde{O}(\varepsilon^{-2})

rounds. (3) Multipass streaming model:

(1 + \varepsilon)

-approximate transshipment and SSSP using

\tilde{O}(n)

space and

\tilde{O}(\varepsilon^{-2})

passes. The previously fastest SSSP algorithms for these models leverage sparse hop sets. We bypass the hop set construction; computing a spanner is sufficient with our method. The above bounds assume non-negative edge weights that are polynomially bounded in

n

; for general non-negative weights, running times scale with the logarithm of the maximum ratio between non-zero weights.Comment: Accepted to SIAM Journal on Computing. Preliminary version in DISC 2017. Abstract shortened to fit arXiv's limitation to 1920 character

arXiv.org e-Print Archive

MPG.PuRe

Matching Is as Easy as the Decision Problem, in the NC Model

Author: Anari Nima
Vazirani Vijay V.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 01/01/2020
Field of study

Is matching in NC, i.e., is there a deterministic fast parallel algorithm for it? This has been an outstanding open question in TCS for over three decades, ever since the discovery of randomized NC matching algorithms [KUW85, MVV87]. Over the last five years, the theoretical computer science community has launched a relentless attack on this question, leading to the discovery of several powerful ideas. We give what appears to be the culmination of this line of work: An NC algorithm for finding a minimum-weight perfect matching in a general graph with polynomially bounded edge weights, provided it is given an oracle for the decision problem. Consequently, for settling the main open problem, it suffices to obtain an NC algorithm for the decision problem. We believe this new fact has qualitatively changed the nature of this open problem. All known efficient matching algorithms for general graphs follow one of two approaches: given by Edmonds [Edm65] and Lov\'asz [Lov79]. Our oracle-based algorithm follows a new approach and uses many of the ideas discovered in the last five years. The difficulty of obtaining an NC perfect matching algorithm led researchers to study matching vis-a-vis clever relaxations of the class NC. In this vein, recently Goldwasser and Grossman [GG15] gave a pseudo-deterministic RNC algorithm for finding a perfect matching in a bipartite graph, i.e., an RNC algorithm with the additional requirement that on the same graph, it should return the same (i.e., unique) perfect matching for almost all choices of random bits. A corollary of our reduction is an analogous algorithm for general graphs.Comment: Appeared in ITCS 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Execution replay and debugging

Author: De Bosschere Koen
de Kergommeaux Jacques Chassin
Ronsse Michiel
Publication venue
Publication date: 01/01/2000
Field of study

As most parallel and distributed programs are internally non-deterministic -- consecutive runs with the same input might result in a different program flow -- vanilla cyclic debugging techniques as such are useless. In order to use cyclic debugging tools, we need a tool that records information about an execution so that it can be replayed for debugging. Because recording information interferes with the execution, we must limit the amount of information and keep the processing of the information fast. This paper contains a survey of existing execution replay techniques and tools.Comment: In M. Ducasse (ed), proceedings of the Fourth International Workshop on Automated Debugging (AADebug 2000), August 2000, Munich. cs.SE/001003

arXiv.org e-Print Archive

CiteSeerX

Ghent University Academic Bibliography

Machine Learning at Microsoft with ML .NET

Author: Ahmed Zeeshan
Amizadeh Saeed
Bilenko Mikhail
Carr Rogan
Chin Wei-Sheng
Dekel Yael
Dupre Xavier
Eksarevskiy Vadim
Erhardt Eric
Eseanu Costin
Filipi Senja
Finley Tom
Goswami Abhishek
Hoover Monte
Inglis Scott
Interlandi Matteo
Katzenberger Shon
Kazmi Najeeb
Krivosheev Gleb
Luferenko Pete
Matantsev Ivan
Matusevych Sergiy
Moradi Shahab
Nazirov Gani
Ormont Justin
Oshri Gal
Pagnoni Artidoro
Parmar Jignesh
Roy Prabhat
Shah Sarthak
Siddiqui Mohammad Zeeshan
Weimer Markus
Zahirazami Shauheen
Zhu Yiwen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/05/2019
Field of study

Machine Learning is transitioning from an art and science into a technology available to every developer. In the near future, every application on every platform will incorporate trained models to encode data-based decisions that would be impossible for developers to author. This presents a significant engineering challenge, since currently data science and modeling are largely decoupled from standard software development processes. This separation makes incorporating machine learning capabilities inside applications unnecessarily costly and difficult, and furthermore discourage developers from embracing ML in first place. In this paper we present ML .NET, a framework developed at Microsoft over the last decade in response to the challenge of making it easy to ship machine learning models in large software applications. We present its architecture, and illuminate the application demands that shaped it. Specifically, we introduce DataView, the core data abstraction of ML .NET which allows it to capture full predictive pipelines efficiently and consistently across training and inference lifecycles. We close the paper with a surprisingly favorable performance study of ML .NET compared to more recent entrants, and a discussion of some lessons learned

arXiv.org e-Print Archive

Crossref

Scipedia

An Efficient Multiway Mergesort for GPU Architectures

Author: Casanova Henri
Iacono John
Karsin Ben
Sitchinava Nodari
Weichert Volker
Publication venue
Publication date: 01/01/2017
Field of study

Sorting is a primitive operation that is a building block for countless algorithms. As such, it is important to design sorting algorithms that approach peak performance on a range of hardware architectures. Graphics Processing Units (GPUs) are particularly attractive architectures as they provides massive parallelism and computing power. However, the intricacies of their compute and memory hierarchies make designing GPU-efficient algorithms challenging. In this work we present GPU Multiway Mergesort (MMS), a new GPU-efficient multiway mergesort algorithm. MMS employs a new partitioning technique that exposes the parallelism needed by modern GPU architectures. To the best of our knowledge, MMS is the first sorting algorithm for the GPU that is asymptotically optimal in terms of global memory accesses and that is completely free of shared memory bank conflicts. We realize an initial implementation of MMS, evaluate its performance on three modern GPU architectures, and compare it to competitive implementations available in state-of-the-art GPU libraries. Despite these implementations being highly optimized, MMS compares favorably, achieving performance improvements for most random inputs. Furthermore, unlike MMS, state-of-the-art algorithms are susceptible to bank conflicts. We find that for certain inputs that cause these algorithms to incur large numbers of bank conflicts, MMS can achieve up to a 37.6% speedup over its fastest competitor. Overall, even though its current implementation is not fully optimized, due to its efficient use of the memory hierarchy, MMS outperforms the fastest comparison-based sorting implementations available to date

arXiv.org e-Print Archive

DI-fusion