5,123 research outputs found
Parallelizing a new algorithm for the set partition problem
In this paper we propose a new approach to organizing parallel computing to find a sequence of all solutions to a problem. We split the sequence into subsequences and then execute concurrently the processes to find these subsequences. We propose a new simple algorithm for the set partition problem and apply the above technique for this algorithm
Parallelizing Deadlock Resolution in Symbolic Synthesis of Distributed Programs
Previous work has shown that there are two major complexity barriers in the
synthesis of fault-tolerant distributed programs: (1) generation of fault-span,
the set of states reachable in the presence of faults, and (2) resolving
deadlock states, from where the program has no outgoing transitions. Of these,
the former closely resembles with model checking and, hence, techniques for
efficient verification are directly applicable to it. Hence, we focus on
expediting the latter with the use of multi-core technology.
We present two approaches for parallelization by considering different design
choices. The first approach is based on the computation of equivalence classes
of program transitions (called group computation) that are needed due to the
issue of distribution (i.e., inability of processes to atomically read and
write all program variables). We show that in most cases the speedup of this
approach is close to the ideal speedup and in some cases it is superlinear. The
second approach uses traditional technique of partitioning deadlock states
among multiple threads. However, our experiments show that the speedup for this
approach is small. Consequently, our analysis demonstrates that a simple
approach of parallelizing the group computation is likely to be the effective
method for using multi-core computing in the context of deadlock resolution
Streaming Graph Challenge: Stochastic Block Partition
An important objective for analyzing real-world graphs is to achieve scalable
performance on large, streaming graphs. A challenging and relevant example is
the graph partition problem. As a combinatorial problem, graph partition is
NP-hard, but existing relaxation methods provide reasonable approximate
solutions that can be scaled for large graphs. Competitive benchmarks and
challenges have proven to be an effective means to advance state-of-the-art
performance and foster community collaboration. This paper describes a graph
partition challenge with a baseline partition algorithm of sub-quadratic
complexity. The algorithm employs rigorous Bayesian inferential methods based
on a statistical model that captures characteristics of the real-world graphs.
This strong foundation enables the algorithm to address limitations of
well-known graph partition approaches such as modularity maximization. This
paper describes various aspects of the challenge including: (1) the data sets
and streaming graph generator, (2) the baseline partition algorithm with
pseudocode, (3) an argument for the correctness of parallelizing the Bayesian
inference, (4) different parallel computation strategies such as node-based
parallelism and matrix-based parallelism, (5) evaluation metrics for partition
correctness and computational requirements, (6) preliminary timing of a
Python-based demonstration code and the open source C++ code, and (7)
considerations for partitioning the graph in streaming fashion. Data sets and
source code for the algorithm as well as metrics, with detailed documentation
are available at GraphChallenge.org.Comment: To be published in 2017 IEEE High Performance Extreme Computing
Conference (HPEC
Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster
The availability of large number of processing nodes in a parallel and
distributed computing environment enables sophisticated real time processing
over high speed data streams, as required by many emerging applications.
Sliding window stream joins are among the most important operators in a stream
processing system. In this paper, we consider the issue of parallelizing a
sliding window stream join operator over a shared nothing cluster. We propose a
framework, based on fixed or predefined communication pattern, to distribute
the join processing loads over the shared-nothing cluster. We consider various
overheads while scaling over a large number of nodes, and propose solution
methodologies to cope with the issues. We implement the algorithm over a
cluster using a message passing system, and present the experimental results
showing the effectiveness of the join processing algorithm.Comment: 11 page
Polynomial-time T-depth Optimization of Clifford+T circuits via Matroid Partitioning
Most work in quantum circuit optimization has been performed in isolation
from the results of quantum fault-tolerance. Here we present a polynomial-time
algorithm for optimizing quantum circuits that takes the actual implementation
of fault-tolerant logical gates into consideration. Our algorithm
re-synthesizes quantum circuits composed of Clifford group and T gates, the
latter being typically the most costly gate in fault-tolerant models, e.g.,
those based on the Steane or surface codes, with the purpose of minimizing both
T-count and T-depth. A major feature of the algorithm is the ability to
re-synthesize circuits with additional ancillae to reduce T-depth at
effectively no cost. The tested benchmarks show up to 65.7% reduction in
T-count and up to 87.6% reduction in T-depth without ancillae, or 99.7%
reduction in T-depth using ancillae.Comment: Version 2 contains substantial improvements and extensions to the
previous version. We describe a new, more robust algorithm and achieve
significantly improved experimental result
Parallel Performance of MPI Sorting Algorithms on Dual-Core Processor Windows-Based Systems
Message Passing Interface (MPI) is widely used to implement parallel
programs. Although Windowsbased architectures provide the facilities of
parallel execution and multi-threading, little attention has been focused on
using MPI on these platforms. In this paper we use the dual core Window-based
platform to study the effect of parallel processes number and also the number
of cores on the performance of three MPI parallel implementations for some
sorting algorithms
Parallel symbolic state-space exploration is difficult, but what is the alternative?
State-space exploration is an essential step in many modeling and analysis
problems. Its goal is to find the states reachable from the initial state of a
discrete-state model described. The state space can used to answer important
questions, e.g., "Is there a dead state?" and "Can N become negative?", or as a
starting point for sophisticated investigations expressed in temporal logic.
Unfortunately, the state space is often so large that ordinary explicit data
structures and sequential algorithms cannot cope, prompting the exploration of
(1) parallel approaches using multiple processors, from simple workstation
networks to shared-memory supercomputers, to satisfy large memory and runtime
requirements and (2) symbolic approaches using decision diagrams to encode the
large structured sets and relations manipulated during state-space generation.
Both approaches have merits and limitations. Parallel explicit state-space
generation is challenging, but almost linear speedup can be achieved; however,
the analysis is ultimately limited by the memory and processors available.
Symbolic methods are a heuristic that can efficiently encode many, but not all,
functions over a structured and exponentially large domain; here the pitfalls
are subtler: their performance varies widely depending on the class of decision
diagram chosen, the state variable order, and obscure algorithmic parameters.
As symbolic approaches are often much more efficient than explicit ones for
many practical models, we argue for the need to parallelize symbolic
state-space generation algorithms, so that we can realize the advantage of both
approaches. This is a challenging endeavor, as the most efficient symbolic
algorithm, Saturation, is inherently sequential. We conclude by discussing
challenges, efforts, and promising directions toward this goal
- …