12,892 research outputs found
Massively Parallel Continuous Local Search for Hybrid SAT Solving on GPUs
Although state-of-the-art (SOTA) SAT solvers based on conflict-driven clause
learning (CDCL) have achieved remarkable engineering success, their sequential
nature limits the parallelism that may be extracted for acceleration on
platforms such as the graphics processing unit (GPU). In this work, we propose
FastFourierSAT, a highly parallel hybrid SAT solver based on gradient-driven
continuous local search (CLS). This is realized by a novel parallel algorithm
inspired by the Fast Fourier Transform (FFT)-based convolution for computing
the elementary symmetric polynomials (ESPs), which is the major computational
task in previous CLS methods. The complexity of our algorithm matches the best
previous result. Furthermore, the substantial parallelism inherent in our
algorithm can leverage the GPU for acceleration, demonstrating significant
improvement over the previous CLS approaches. We also propose to incorporate
the restart heuristics in CLS to improve search efficiency. We compare our
approach with the SOTA parallel SAT solvers on several benchmarks. Our results
show that FastFourierSAT computes the gradient 100+ times faster than previous
prototypes implemented on CPU. Moreover, FastFourierSAT solves most instances
and demonstrates promising performance on larger-size instances
An event-based architecture for solving constraint satisfaction problems
Constraint satisfaction problems (CSPs) are typically solved using
conventional von Neumann computing architectures. However, these architectures
do not reflect the distributed nature of many of these problems and are thus
ill-suited to solving them. In this paper we present a hybrid analog/digital
hardware architecture specifically designed to solve such problems. We cast
CSPs as networks of stereotyped multi-stable oscillatory elements that
communicate using digital pulses, or events. The oscillatory elements are
implemented using analog non-stochastic circuits. The non-repeating phase
relations among the oscillatory elements drive the exploration of the solution
space. We show that this hardware architecture can yield state-of-the-art
performance on a number of CSPs under reasonable assumptions on the
implementation. We present measurements from a prototype electronic chip to
demonstrate that a physical implementation of the proposed architecture is
robust to practical non-idealities and to validate the theory proposed.Comment: First two authors contributed equally to this wor
Scalable Parallel Numerical Constraint Solver Using Global Load Balancing
We present a scalable parallel solver for numerical constraint satisfaction
problems (NCSPs). Our parallelization scheme consists of homogeneous worker
solvers, each of which runs on an available core and communicates with others
via the global load balancing (GLB) method. The parallel solver is implemented
with X10 that provides an implementation of GLB as a library. In experiments,
several NCSPs from the literature were solved and attained up to 516-fold
speedup using 600 cores of the TSUBAME2.5 supercomputer.Comment: To be presented at X10'15 Worksho
Scalable Parallel Numerical CSP Solver
We present a parallel solver for numerical constraint satisfaction problems
(NCSPs) that can scale on a number of cores. Our proposed method runs worker
solvers on the available cores and simultaneously the workers cooperate for the
search space distribution and balancing. In the experiments, we attained up to
119-fold speedup using 256 cores of a parallel computer.Comment: The final publication is available at Springe
Parallel local search for solving Constraint Problems on the Cell Broadband Engine (Preliminary Results)
We explore the use of the Cell Broadband Engine (Cell/BE for short) for
combinatorial optimization applications: we present a parallel version of a
constraint-based local search algorithm that has been implemented on a
multiprocessor BladeCenter machine with twin Cell/BE processors (total of 16
SPUs per blade). This algorithm was chosen because it fits very well the
Cell/BE architecture and requires neither shared memory nor communication
between processors, while retaining a compact memory footprint. We study the
performance on several large optimization benchmarks and show that this
achieves mostly linear time speedups, even sometimes super-linear. This is
possible because the parallel implementation might explore simultaneously
different parts of the search space and therefore converge faster towards the
best sub-space and thus towards a solution. Besides getting speedups, the
resulting times exhibit a much smaller variance, which benefits applications
where a timely reply is critical
- …