Search CORE

5,123 research outputs found

Parallelizing a new algorithm for the set partition problem

Author: Chi Thanh Hoang
Publication venue: 'Uniwersytetu Marii Curie-Sklodowskiej w Lublinie'
Publication date: 01/01/2010
Field of study

In this paper we propose a new approach to organizing parallel computing to find a sequence of all solutions to a problem. We split the sequence into subsequences and then execute concurrently the processes to find these subsequences. We propose a new simple algorithm for the set partition problem and apply the above technique for this algorithm

Biblioteka Nauki - repozytorium artykuÅÃ³w

University of Maria Curie-Skłodowska (UMCS): Scientific e-Journals / Uniwersytet Marii Curie-Skłodowskiej: e-czasopisma naukowe

Parallelizing Deadlock Resolution in Symbolic Synthesis of Distributed Programs

Author: A. Arora
A. Arora
A. Ebnenasir
B. Bonakdarpour
Borzoo Bonakdarpour
E. A. Emerson
F. Abujarad
F. Somenzi
Fuad Abujarad
J. Ezekiel
J. Ezekiel
J. Ezekiel
Jaco van de Pol
K. Milvang-Jensen
L. Lamport
Lubos Brim
Maurice Herlihy
O. Grumberg
O. Grumberg
S. S. Kulkarni
S. S. Kulkarni
Sandeep S. Kulkarni
T. Stornetta
Publication venue: 'Open Publishing Association'
Publication date: 01/12/2009
Field of study

Previous work has shown that there are two major complexity barriers in the synthesis of fault-tolerant distributed programs: (1) generation of fault-span, the set of states reachable in the presence of faults, and (2) resolving deadlock states, from where the program has no outgoing transitions. Of these, the former closely resembles with model checking and, hence, techniques for efficient verification are directly applicable to it. Hence, we focus on expediting the latter with the use of multi-core technology. We present two approaches for parallelization by considering different design choices. The first approach is based on the computation of equivalence classes of program transitions (called group computation) that are needed due to the issue of distribution (i.e., inability of processes to atomically read and write all program variables). We show that in most cases the speedup of this approach is close to the ideal speedup and in some cases it is superlinear. The second approach uses traditional technique of partitioning deadlock states among multiple threads. However, our experiments show that the speedup for this approach is small. Consequently, our analysis demonstrates that a simple approach of parallelizing the group computation is likely to be the effective method for using multi-core computing in the context of deadlock resolution

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Streaming Graph Challenge: Stochastic Block Partition

Author: Gadepally Vijay
Hurley Michael
Jones Michael
Kao Edward
Kepner Jeremy
Mohindra Sanjeev
Monticciolo Paul
Reuther Albert
Samsi Siddharth
Smith Steven
Song William
Staheli Diane
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/08/2017
Field of study

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.Comment: To be published in 2017 IEEE High Performance Extreme Computing Conference (HPEC

arXiv.org e-Print Archive

Crossref

Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster

Author: Chakraborty Abhirup
Singh Ajit
Publication venue
Publication date: 24/07/2013
Field of study

The availability of large number of processing nodes in a parallel and distributed computing environment enables sophisticated real time processing over high speed data streams, as required by many emerging applications. Sliding window stream joins are among the most important operators in a stream processing system. In this paper, we consider the issue of parallelizing a sliding window stream join operator over a shared nothing cluster. We propose a framework, based on fixed or predefined communication pattern, to distribute the join processing loads over the shared-nothing cluster. We consider various overheads while scaling over a large number of nodes, and propose solution methodologies to cope with the issues. We implement the algorithm over a cluster using a message passing system, and present the experimental results showing the effectiveness of the join processing algorithm.Comment: 11 page

arXiv.org e-Print Archive

Crossref

Polynomial-time T-depth Optimization of Clifford+T circuits via Matroid Partitioning

Author: Amy Matthew
Maslov Dmitri
Mosca Michele
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/12/2013
Field of study

Most work in quantum circuit optimization has been performed in isolation from the results of quantum fault-tolerance. Here we present a polynomial-time algorithm for optimizing quantum circuits that takes the actual implementation of fault-tolerant logical gates into consideration. Our algorithm re-synthesizes quantum circuits composed of Clifford group and T gates, the latter being typically the most costly gate in fault-tolerant models, e.g., those based on the Steane or surface codes, with the purpose of minimizing both T-count and T-depth. A major feature of the algorithm is the ability to re-synthesize circuits with additional ancillae to reduce T-depth at effectively no cost. The tested benchmarks show up to 65.7% reduction in T-count and up to 87.6% reduction in T-depth without ancillae, or 99.7% reduction in T-depth using ancillae.Comment: Version 2 contains substantial improvements and extensions to the previous version. We describe a new, more robust algorithm and achieve significantly improved experimental result

arXiv.org e-Print Archive

CiteSeerX

Parallel Performance of MPI Sorting Algorithms on Dual-Core Processor Windows-Based Systems

Author: Elnashar Alaa Ismail
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 30/05/2011
Field of study

Message Passing Interface (MPI) is widely used to implement parallel programs. Although Windowsbased architectures provide the facilities of parallel execution and multi-threading, little attention has been focused on using MPI on these platforms. In this paper we use the dual core Window-based platform to study the effect of parallel processes number and also the number of cores on the performance of three MPI parallel implementations for some sorting algorithms

arXiv.org e-Print Archive

CiteSeerX

Crossref

Parallel symbolic state-space exploration is difficult, but what is the alternative?

State-space exploration is an essential step in many modeling and analysis problems. Its goal is to find the states reachable from the initial state of a discrete-state model described. The state space can used to answer important questions, e.g., "Is there a dead state?" and "Can N become negative?", or as a starting point for sophisticated investigations expressed in temporal logic. Unfortunately, the state space is often so large that ordinary explicit data structures and sequential algorithms cannot cope, prompting the exploration of (1) parallel approaches using multiple processors, from simple workstation networks to shared-memory supercomputers, to satisfy large memory and runtime requirements and (2) symbolic approaches using decision diagrams to encode the large structured sets and relations manipulated during state-space generation. Both approaches have merits and limitations. Parallel explicit state-space generation is challenging, but almost linear speedup can be achieved; however, the analysis is ultimately limited by the memory and processors available. Symbolic methods are a heuristic that can efficiently encode many, but not all, functions over a structured and exponentially large domain; here the pitfalls are subtler: their performance varies widely depending on the class of decision diagram chosen, the state variable order, and obscure algorithmic parameters. As symbolic approaches are often much more efficient than explicit ones for many practical models, we argue for the need to parallelize symbolic state-space generation algorithms, so that we can realize the advantage of both approaches. This is a challenging endeavor, as the most efficient symbolic algorithm, Saturation, is inherently sequential. We conclude by discussing challenges, efforts, and promising directions toward this goal

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals