114,315 research outputs found

    Optimization by Record Dynamics

    Full text link
    Large dynamical changes in thermalizing glassy systems are triggered by trajectories crossing record sized barriers, a behavior revealing the presence of a hierarchical structure in configuration space. The observation is here turned into a novel local search optimization algorithm dubbed Record Dynamics Optimization, or RDO. RDO uses the Metropolis rule to accept or reject candidate solutions depending on the value of a parameter akin to the temperature, and minimizes the cost function of the problem at hand through cycles where its `temperature' is raised and subsequently decreased in order to expediently generate record high (and low) values of the cost function. Below, RDO is introduced and then tested by searching the ground state of the Edwards-Anderson spin-glass model, in two and three spatial dimensions. A popular and highly efficient optimization algorithm, Parallel Tempering (PT) is applied to the same problem as a benchmark. RDO and PT turn out to produce solution of similar quality for similar numerical effort, but RDO is simpler to program and additionally yields geometrical information on the system's configuration space which is of interest in many applications. In particular, the effectiveness of RDO strongly indicates the presence of the above mentioned hierarchically organized configuration space, with metastable regions indexed by the cost (or energy) of the transition states connecting them.Comment: 14 pages, 12 figure

    Distributed top-k aggregation queries at large

    Get PDF
    Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network

    A hierarchical time-splitting approach for solving finite-time optimal control problems

    Get PDF
    We present a hierarchical computation approach for solving finite-time optimal control problems using operator splitting methods. The first split is performed over the time index and leads to as many subproblems as the length of the prediction horizon. Each subproblem is solved in parallel and further split into three by separating the objective from the equality and inequality constraints respectively, such that an analytic solution can be achieved for each subproblem. The proposed solution approach leads to a nested decomposition scheme, which is highly parallelizable. We present a numerical comparison with standard state-of-the-art solvers, and provide analytic solutions to several elements of the algorithm, which enhances its applicability in fast large-scale applications

    A Tuned and Scalable Fast Multipole Method as a Preeminent Algorithm for Exascale Systems

    Full text link
    Among the algorithms that are likely to play a major role in future exascale computing, the fast multipole method (FMM) appears as a rising star. Our previous recent work showed scaling of an FMM on GPU clusters, with problem sizes in the order of billions of unknowns. That work led to an extremely parallel FMM, scaling to thousands of GPUs or tens of thousands of CPUs. This paper reports on a a campaign of performance tuning and scalability studies using multi-core CPUs, on the Kraken supercomputer. All kernels in the FMM were parallelized using OpenMP, and a test using 10^7 particles randomly distributed in a cube showed 78% efficiency on 8 threads. Tuning of the particle-to-particle kernel using SIMD instructions resulted in 4x speed-up of the overall algorithm on single-core tests with 10^3 - 10^7 particles. Parallel scalability was studied in both strong and weak scaling. The strong scaling test used 10^8 particles and resulted in 93% parallel efficiency on 2048 processes for the non-SIMD code and 54% for the SIMD-optimized code (which was still 2x faster). The weak scaling test used 10^6 particles per process, and resulted in 72% efficiency on 32,768 processes, with the largest calculation taking about 40 seconds to evaluate more than 32 billion unknowns. This work builds up evidence for our view that FMM is poised to play a leading role in exascale computing, and we end the paper with a discussion of the features that make it a particularly favorable algorithm for the emerging heterogeneous and massively parallel architectural landscape
    corecore