35,887 research outputs found
Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures
Feltor is a modular and free scientific software package. It allows
developing platform independent code that runs on a variety of parallel
computer architectures ranging from laptop CPUs to multi-GPU distributed memory
systems. Feltor consists of both a numerical library and a collection of
application codes built on top of the library. Its main target are two- and
three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin
methods as the main numerical discretization technique. We observe that
numerical simulations of a recently developed gyro-fluid model produce
non-deterministic results in parallel computations. First, we show how we
restore accuracy and bitwise reproducibility algorithmically and
programmatically. In particular, we adopt an implementation of the exactly
rounded dot product based on long accumulators, which avoids accuracy losses
especially in parallel applications. However, reproducibility and accuracy
alone fail to indicate correct simulation behaviour. In fact, in the physical
model slightly different initial conditions lead to vastly different end
states. This behaviour translates to its numerical representation. Pointwise
convergence, even in principle, becomes impossible for long simulation times.
In a second part, we explore important performance tuning considerations. We
identify latency and memory bandwidth as the main performance indicators of our
routines. Based on these, we propose a parallel performance model that predicts
the execution time of algorithms implemented in Feltor and test our model on a
selection of parallel hardware architectures. We are able to predict the
execution time with a relative error of less than 25% for problem sizes between
0.1 and 1000 MB. Finally, we find that the product of latency and bandwidth
gives a minimum array size per compute node to achieve a scaling efficiency
above 50% (both strong and weak)
Fast Quantum Modular Exponentiation
We present a detailed analysis of the impact on modular exponentiation of
architectural features and possible concurrent gate execution. Various
arithmetic algorithms are evaluated for execution time, potential concurrency,
and space tradeoffs. We find that, to exponentiate an n-bit number, for storage
space 100n (twenty times the minimum 5n), we can execute modular exponentiation
two hundred to seven hundred times faster than optimized versions of the basic
algorithms, depending on architecture, for n=128. Addition on a neighbor-only
architecture is limited to O(n) time when non-neighbor architectures can reach
O(log n), demonstrating that physical characteristics of a computing device
have an important impact on both real-world running time and asymptotic
behavior. Our results will help guide experimental implementations of quantum
algorithms and devices.Comment: to appear in PRA 71(5); RevTeX, 12 pages, 12 figures; v2 revision is
substantial, with new algorithmic variants, much shorter and clearer text,
and revised equation formattin
Efficient computation of approximate pure Nash equilibria in congestion games
Congestion games constitute an important class of games in which computing an
exact or even approximate pure Nash equilibrium is in general {\sf
PLS}-complete. We present a surprisingly simple polynomial-time algorithm that
computes O(1)-approximate Nash equilibria in these games. In particular, for
congestion games with linear latency functions, our algorithm computes
-approximate pure Nash equilibria in time polynomial in the
number of players, the number of resources and . It also applies to
games with polynomial latency functions with constant maximum degree ;
there, the approximation guarantee is . The algorithm essentially
identifies a polynomially long sequence of best-response moves that lead to an
approximate equilibrium; the existence of such short sequences is interesting
in itself. These are the first positive algorithmic results for approximate
equilibria in non-symmetric congestion games. We strengthen them further by
proving that, for congestion games that deviate from our mild assumptions,
computing -approximate equilibria is {\sf PLS}-complete for any
polynomial-time computable
Concurrent Geometric Multicasting
We present MCFR, a multicasting concurrent face routing algorithm that uses
geometric routing to deliver a message from source to multiple targets. We
describe the algorithm's operation, prove it correct, estimate its performance
bounds and evaluate its performance using simulation. Our estimate shows that
MCFR is the first geometric multicast routing algorithm whose message delivery
latency is independent of network size and only proportional to the distance
between the source and the targets. Our simulation indicates that MCFR has
significantly better reliability than existing algorithms
A Novel SAT-Based Approach to the Task Graph Cost-Optimal Scheduling Problem
The Task Graph Cost-Optimal Scheduling Problem consists in scheduling a certain number of interdependent tasks onto a set of heterogeneous processors (characterized by idle and running rates per time unit), minimizing the cost of the entire process. This paper provides a novel formulation for this scheduling puzzle, in which an optimal solution is computed through a sequence of Binate Covering Problems, hinged within a Bounded Model Checking paradigm. In this approach, each covering instance, providing a min-cost trace for a given schedule depth, can be solved with several strategies, resorting to Minimum-Cost Satisfiability solvers or Pseudo-Boolean Optimization tools. Unfortunately, all direct resolution methods show very low efficiency and scalability. As a consequence, we introduce a specialized method to solve the same sequence of problems, based on a traditional all-solution SAT solver. This approach follows the "circuit cofactoring" strategy, as it exploits a powerful technique to capture a large set of solutions for any new SAT counter-example. The overall method is completed with a branch-and-bound heuristic which evaluates lower and upper bounds of the schedule length, to reduce the state space that has to be visited. Our results show that the proposed strategy significantly improves the blind binate covering schema, and it outperforms general purpose state-of-the-art tool
Heuristics for the traveling repairman problem with profits
In the traveling repairman problem with profits, a repairman (also known as the server) visits a subset of nodes in order to collect time-dependent profits. The objective consists of maximizing the total collected revenue. We restrict our study to the case of a single server with nodes located in the Euclidean plane. We investigate properties of this problem, and we derive a mathematical model assuming that the number of visited nodes is known in advance. We describe a tabu search algorithm with multiple neighborhoods, and we test its performance by running it on instances based on TSPLIB. We conclude that the tabu search algorithm finds good-quality solutions fast, even for large instances
Asymmetric Traveling Salesman Path and Directed Latency Problems
We study integrality gaps and approximability of two closely related problems
on directed graphs. Given a set V of n nodes in an underlying asymmetric metric
and two specified nodes s and t, both problems ask to find an s-t path visiting
all other nodes. In the asymmetric traveling salesman path problem (ATSPP), the
objective is to minimize the total cost of this path. In the directed latency
problem, the objective is to minimize the sum of distances on this path from s
to each node. Both of these problems are NP-hard. The best known approximation
algorithms for ATSPP had ratio O(log n) until the very recent result that
improves it to O(log n/ log log n). However, only a bound of O(sqrt(n)) for the
integrality gap of its linear programming relaxation has been known. For
directed latency, the best previously known approximation algorithm has a
guarantee of O(n^(1/2+eps)), for any constant eps > 0. We present a new
algorithm for the ATSPP problem that has an approximation ratio of O(log n),
but whose analysis also bounds the integrality gap of the standard LP
relaxation of ATSPP by the same factor. This solves an open problem posed by
Chekuri and Pal [2007]. We then pursue a deeper study of this linear program
and its variations, which leads to an algorithm for the k-person ATSPP (where k
s-t paths of minimum total length are sought) and an O(log n)-approximation for
the directed latency problem
- …