35,887 research outputs found

    Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures

    Get PDF
    Feltor is a modular and free scientific software package. It allows developing platform independent code that runs on a variety of parallel computer architectures ranging from laptop CPUs to multi-GPU distributed memory systems. Feltor consists of both a numerical library and a collection of application codes built on top of the library. Its main target are two- and three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin methods as the main numerical discretization technique. We observe that numerical simulations of a recently developed gyro-fluid model produce non-deterministic results in parallel computations. First, we show how we restore accuracy and bitwise reproducibility algorithmically and programmatically. In particular, we adopt an implementation of the exactly rounded dot product based on long accumulators, which avoids accuracy losses especially in parallel applications. However, reproducibility and accuracy alone fail to indicate correct simulation behaviour. In fact, in the physical model slightly different initial conditions lead to vastly different end states. This behaviour translates to its numerical representation. Pointwise convergence, even in principle, becomes impossible for long simulation times. In a second part, we explore important performance tuning considerations. We identify latency and memory bandwidth as the main performance indicators of our routines. Based on these, we propose a parallel performance model that predicts the execution time of algorithms implemented in Feltor and test our model on a selection of parallel hardware architectures. We are able to predict the execution time with a relative error of less than 25% for problem sizes between 0.1 and 1000 MB. Finally, we find that the product of latency and bandwidth gives a minimum array size per compute node to achieve a scaling efficiency above 50% (both strong and weak)

    Fast Quantum Modular Exponentiation

    Full text link
    We present a detailed analysis of the impact on modular exponentiation of architectural features and possible concurrent gate execution. Various arithmetic algorithms are evaluated for execution time, potential concurrency, and space tradeoffs. We find that, to exponentiate an n-bit number, for storage space 100n (twenty times the minimum 5n), we can execute modular exponentiation two hundred to seven hundred times faster than optimized versions of the basic algorithms, depending on architecture, for n=128. Addition on a neighbor-only architecture is limited to O(n) time when non-neighbor architectures can reach O(log n), demonstrating that physical characteristics of a computing device have an important impact on both real-world running time and asymptotic behavior. Our results will help guide experimental implementations of quantum algorithms and devices.Comment: to appear in PRA 71(5); RevTeX, 12 pages, 12 figures; v2 revision is substantial, with new algorithmic variants, much shorter and clearer text, and revised equation formattin

    Efficient computation of approximate pure Nash equilibria in congestion games

    Get PDF
    Congestion games constitute an important class of games in which computing an exact or even approximate pure Nash equilibrium is in general {\sf PLS}-complete. We present a surprisingly simple polynomial-time algorithm that computes O(1)-approximate Nash equilibria in these games. In particular, for congestion games with linear latency functions, our algorithm computes (2+ϵ)(2+\epsilon)-approximate pure Nash equilibria in time polynomial in the number of players, the number of resources and 1/ϵ1/\epsilon. It also applies to games with polynomial latency functions with constant maximum degree dd; there, the approximation guarantee is dO(d)d^{O(d)}. The algorithm essentially identifies a polynomially long sequence of best-response moves that lead to an approximate equilibrium; the existence of such short sequences is interesting in itself. These are the first positive algorithmic results for approximate equilibria in non-symmetric congestion games. We strengthen them further by proving that, for congestion games that deviate from our mild assumptions, computing ρ\rho-approximate equilibria is {\sf PLS}-complete for any polynomial-time computable ρ\rho

    Concurrent Geometric Multicasting

    Full text link
    We present MCFR, a multicasting concurrent face routing algorithm that uses geometric routing to deliver a message from source to multiple targets. We describe the algorithm's operation, prove it correct, estimate its performance bounds and evaluate its performance using simulation. Our estimate shows that MCFR is the first geometric multicast routing algorithm whose message delivery latency is independent of network size and only proportional to the distance between the source and the targets. Our simulation indicates that MCFR has significantly better reliability than existing algorithms

    A Novel SAT-Based Approach to the Task Graph Cost-Optimal Scheduling Problem

    Get PDF
    The Task Graph Cost-Optimal Scheduling Problem consists in scheduling a certain number of interdependent tasks onto a set of heterogeneous processors (characterized by idle and running rates per time unit), minimizing the cost of the entire process. This paper provides a novel formulation for this scheduling puzzle, in which an optimal solution is computed through a sequence of Binate Covering Problems, hinged within a Bounded Model Checking paradigm. In this approach, each covering instance, providing a min-cost trace for a given schedule depth, can be solved with several strategies, resorting to Minimum-Cost Satisfiability solvers or Pseudo-Boolean Optimization tools. Unfortunately, all direct resolution methods show very low efficiency and scalability. As a consequence, we introduce a specialized method to solve the same sequence of problems, based on a traditional all-solution SAT solver. This approach follows the "circuit cofactoring" strategy, as it exploits a powerful technique to capture a large set of solutions for any new SAT counter-example. The overall method is completed with a branch-and-bound heuristic which evaluates lower and upper bounds of the schedule length, to reduce the state space that has to be visited. Our results show that the proposed strategy significantly improves the blind binate covering schema, and it outperforms general purpose state-of-the-art tool

    Heuristics for the traveling repairman problem with profits

    Get PDF
    In the traveling repairman problem with profits, a repairman (also known as the server) visits a subset of nodes in order to collect time-dependent profits. The objective consists of maximizing the total collected revenue. We restrict our study to the case of a single server with nodes located in the Euclidean plane. We investigate properties of this problem, and we derive a mathematical model assuming that the number of visited nodes is known in advance. We describe a tabu search algorithm with multiple neighborhoods, and we test its performance by running it on instances based on TSPLIB. We conclude that the tabu search algorithm finds good-quality solutions fast, even for large instances

    Asymmetric Traveling Salesman Path and Directed Latency Problems

    Full text link
    We study integrality gaps and approximability of two closely related problems on directed graphs. Given a set V of n nodes in an underlying asymmetric metric and two specified nodes s and t, both problems ask to find an s-t path visiting all other nodes. In the asymmetric traveling salesman path problem (ATSPP), the objective is to minimize the total cost of this path. In the directed latency problem, the objective is to minimize the sum of distances on this path from s to each node. Both of these problems are NP-hard. The best known approximation algorithms for ATSPP had ratio O(log n) until the very recent result that improves it to O(log n/ log log n). However, only a bound of O(sqrt(n)) for the integrality gap of its linear programming relaxation has been known. For directed latency, the best previously known approximation algorithm has a guarantee of O(n^(1/2+eps)), for any constant eps > 0. We present a new algorithm for the ATSPP problem that has an approximation ratio of O(log n), but whose analysis also bounds the integrality gap of the standard LP relaxation of ATSPP by the same factor. This solves an open problem posed by Chekuri and Pal [2007]. We then pursue a deeper study of this linear program and its variations, which leads to an algorithm for the k-person ATSPP (where k s-t paths of minimum total length are sought) and an O(log n)-approximation for the directed latency problem
    corecore