114,315 research outputs found
Optimization by Record Dynamics
Large dynamical changes in thermalizing glassy systems are triggered by
trajectories crossing record sized barriers, a behavior revealing the presence
of a hierarchical structure in configuration space. The observation is here
turned into a novel local search optimization algorithm dubbed Record Dynamics
Optimization, or RDO. RDO uses the Metropolis rule to accept or reject
candidate solutions depending on the value of a parameter akin to the
temperature, and minimizes the cost function of the problem at hand through
cycles where its `temperature' is raised and subsequently decreased in order to
expediently generate record high (and low) values of the cost function. Below,
RDO is introduced and then tested by searching the ground state of the
Edwards-Anderson spin-glass model, in two and three spatial dimensions. A
popular and highly efficient optimization algorithm, Parallel Tempering (PT) is
applied to the same problem as a benchmark. RDO and PT turn out to produce
solution of similar quality for similar numerical effort, but RDO is simpler to
program and additionally yields geometrical information on the system's
configuration space which is of interest in many applications. In particular,
the effectiveness of RDO strongly indicates the presence of the above mentioned
hierarchically organized configuration space, with metastable regions indexed
by the cost (or energy) of the transition states connecting them.Comment: 14 pages, 12 figure
Distributed top-k aggregation queries at large
Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network
A hierarchical time-splitting approach for solving finite-time optimal control problems
We present a hierarchical computation approach for solving finite-time
optimal control problems using operator splitting methods. The first split is
performed over the time index and leads to as many subproblems as the length of
the prediction horizon. Each subproblem is solved in parallel and further split
into three by separating the objective from the equality and inequality
constraints respectively, such that an analytic solution can be achieved for
each subproblem. The proposed solution approach leads to a nested decomposition
scheme, which is highly parallelizable. We present a numerical comparison with
standard state-of-the-art solvers, and provide analytic solutions to several
elements of the algorithm, which enhances its applicability in fast large-scale
applications
A Tuned and Scalable Fast Multipole Method as a Preeminent Algorithm for Exascale Systems
Among the algorithms that are likely to play a major role in future exascale
computing, the fast multipole method (FMM) appears as a rising star. Our
previous recent work showed scaling of an FMM on GPU clusters, with problem
sizes in the order of billions of unknowns. That work led to an extremely
parallel FMM, scaling to thousands of GPUs or tens of thousands of CPUs. This
paper reports on a a campaign of performance tuning and scalability studies
using multi-core CPUs, on the Kraken supercomputer. All kernels in the FMM were
parallelized using OpenMP, and a test using 10^7 particles randomly distributed
in a cube showed 78% efficiency on 8 threads. Tuning of the
particle-to-particle kernel using SIMD instructions resulted in 4x speed-up of
the overall algorithm on single-core tests with 10^3 - 10^7 particles. Parallel
scalability was studied in both strong and weak scaling. The strong scaling
test used 10^8 particles and resulted in 93% parallel efficiency on 2048
processes for the non-SIMD code and 54% for the SIMD-optimized code (which was
still 2x faster). The weak scaling test used 10^6 particles per process, and
resulted in 72% efficiency on 32,768 processes, with the largest calculation
taking about 40 seconds to evaluate more than 32 billion unknowns. This work
builds up evidence for our view that FMM is poised to play a leading role in
exascale computing, and we end the paper with a discussion of the features that
make it a particularly favorable algorithm for the emerging heterogeneous and
massively parallel architectural landscape
- …