Search CORE

378,723 research outputs found

Efficient Implementation of a Synchronous Parallel Push-Relabel Algorithm

Author: A. Goldberg
A.V. Goldberg
B. Hong
B.G. Chandran
B.V. Cherkassky
D.S. Hochbaum
G.W. Flake
P. Sanders
Y. Shiloach
Publication venue
Publication date: 23/07/2015
Field of study

Motivated by the observation that FIFO-based push-relabel algorithms are able to outperform highest label-based variants on modern, large maximum flow problem instances, we introduce an efficient implementation of the algorithm that uses coarse-grained parallelism to avoid the problems of existing parallel approaches. We demonstrate good relative and absolute speedups of our algorithm on a set of large graph instances taken from real-world applications. On a modern 40-core machine, our parallel implementation outperforms existing sequential implementations by up to a factor of 12 and other parallel implementations by factors of up to 3

arXiv.org e-Print Archive

Crossref

Fast matrix multiplication techniques based on the Adleman-Lipton model

Author: Nayebi Aran
Publication venue: 'Academic Journals'
Publication date: 18/12/2011
Field of study

On distributed memory electronic computers, the implementation and association of fast parallel matrix multiplication algorithms has yielded astounding results and insights. In this discourse, we use the tools of molecular biology to demonstrate the theoretical encoding of Strassen's fast matrix multiplication algorithm with DNA based on an

n

-moduli set in the residue number system, thereby demonstrating the viability of computational mathematics with DNA. As a result, a general scalable implementation of this model in the DNA computing paradigm is presented and can be generalized to the application of \emph{all} fast matrix multiplication algorithms on a DNA computer. We also discuss the practical capabilities and issues of this scalable implementation. Fast methods of matrix computations with DNA are important because they also allow for the efficient implementation of other algorithms (i.e. inversion, computing determinants, and graph theory) with DNA.Comment: To appear in the International Journal of Computer Engineering Research. Minor changes made to make the preprint as similar as possible to the published versio

arXiv.org e-Print Archive

Crossref

Distributed computing methodology for training neural networks in an image-guided diagnostic application

Author: Magoulas George D.
Plagianakos V.P.
Vrahatis M.N.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

Distributed computing is a process through which a set of computers connected by a network is used collectively to solve a single problem. In this paper, we propose a distributed computing methodology for training neural networks for the detection of lesions in colonoscopy. Our approach is based on partitioning the training set across multiple processors using a parallel virtual machine. In this way, interconnected computers of varied architectures can be used for the distributed evaluation of the error function and gradient values, and, thus, training neural networks utilizing various learning methods. The proposed methodology has large granularity and low synchronization, and has been implemented and tested. Our results indicate that the parallel virtual machine implementation of the training algorithms developed leads to considerable speedup, especially when large network architectures and training sets are used

Birkbeck Institutional Research Online

Empirical Evaluation of the Parallel Distribution Sweeping Framework on Multicore Architectures

Author: A. Aggarwal
D. Ajwani
J. Singler
J.L. Bentley
K. Mehlhorn
S. Kang
Publication venue
Publication date: 01/01/2013
Field of study

In this paper, we perform an empirical evaluation of the Parallel External Memory (PEM) model in the context of geometric problems. In particular, we implement the parallel distribution sweeping framework of Ajwani, Sitchinava and Zeh to solve batched 1-dimensional stabbing max problem. While modern processors consist of sophisticated memory systems (multiple levels of caches, set associativity, TLB, prefetching), we empirically show that algorithms designed in simple models, that focus on minimizing the I/O transfers between shared memory and single level cache, can lead to efficient software on current multicore architectures. Our implementation exhibits significantly fewer accesses to slow DRAM and, therefore, outperforms traditional approaches based on plane sweep and two-way divide and conquer.Comment: Longer version of ESA'13 pape

arXiv.org e-Print Archive

Crossref

Techniques for Autotuning Algorithms on Heterogenous Platforms

Author: Amor Margarita
Diéguez Adrián P.
Doallo Ramón
Publication venue
Publication date: 01/01/2016
Field of study

Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.Current GPUs (Graphic Processing Units) can obtain high computational performance in scientific applications. Nevertheless, programmers have to use suitable parallel algorithms for these architectures and have to consider optimization techniques in the implementation in order to achieve that performance. This thesis is focused on designing and implementing parallel prefix algorithms into GPU architectures with little effort. For that, we have developed a very optimized library called BPLG (Tuning Butterfly Processing Library for GPUs) and based on a set of building blocks that enable to easily design well-known algorithms such as FFT, tridiagonal systems solvers, scan operator, sorting or signal processing. This library is designed under a tuning methodology based on two-stages indentified as GPU resource analysis and operator string manipulation. Specifically, this strategy is focused on a set of parallel prefix algorithms that can be represented according to a set of common permutations of the digits of each of its element indices [4], denoted as Index-Digit (ID) algorithms. So far, the proposed methodology has obtained very good results with respect to state-of-art libraries, as CUFFT, CUSPARSE, CUDPP or ModernGPU.European Cooperation in Science and Technology. COS

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

A methodology for parallel implementation of the basic operations of digital signal processing

Author: Klimova O. V.
Publication venue: 'AIP Publishing'
Publication date: 01/01/2019
Field of study

A methodology for parallel implementation of the basic operations of digital signal processing is considered. The methodology creation is based on the analysis and generalization of the results obtained in the construction of the model description of computation organization. The methodology provides a set of formal transformations that allow you to transform a sequential computing system into a parallel adaptive processing mode. The methodology offers a formal basis for the concurrent exploration of algorithms and architectures, thus creating a basis for improving the efficiency of parallel computing. © 2019 Author(s)

Crossref

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

A Parallel General Purpose Multi-Objective Optimization Framework, with Application to Beam Dynamics

Author: Adelmann A.
Arbenz P.
Bekas C.
Curioni A.
Ineichen Y.
Kolano A.
Metzger-Kraus C.
Neveu N.
Spentzouris L.
Publication venue
Publication date: 23/02/2019
Field of study

Particle accelerators are invaluable tools for research in the basic and applied sciences, in fields such as materials science, chemistry, the biosciences, particle physics, nuclear physics and medicine. The design, commissioning, and operation of accelerator facilities is a non-trivial task, due to the large number of control parameters and the complex interplay of several conflicting design goals. We propose to tackle this problem by means of multi-objective optimization algorithms which also facilitate a parallel deployment. In order to compute solutions in a meaningful time frame a fast and scalable software framework is required. In this paper, we present the implementation of such a general-purpose framework for simulation-based multi-objective optimization methods that allows the automatic investigation of optimal sets of machine parameters. The implementation is based on a master/slave paradigm, employing several masters that govern a set of slaves executing simulations and performing optimization tasks. Using evolutionary algorithms as the optimizer and OPAL as the forward solver, validation experiments and results of multi-objective optimization problems in the domain of beam dynamics are presented. The high charge beam line at the Argonne Wakefield Accelerator Facility was used as the beam dynamics model. The 3D beam size, transverse momentum, and energy spread were optimized

arXiv.org e-Print Archive

Directory of Open Access Journals