1,284 research outputs found
Indexed dependence metadata and its applications in software performance optimisation
To achieve continued performance improvements, modern microprocessor design is tending to concentrate
an increasing proportion of hardware on computation units with less automatic management
of data movement and extraction of parallelism. As a result, architectures increasingly include multiple
computation cores and complicated, software-managed memory hierarchies. Compilers have
difficulty characterizing the behaviour of a kernel in a general enough manner to enable automatic
generation of efficient code in any but the most straightforward of cases.
We propose the concept of indexed dependence metadata to improve application development and
mapping onto such architectures. The metadata represent both the iteration space of a kernel and the
mapping of that iteration space from a given index to the set of data elements that iteration might
use: thus the dependence metadata is indexed by the kernel’s iteration space. This explicit mapping
allows the compiler or runtime to optimise the program more efficiently, and improves the program
structure for the developer. We argue that this form of explicit interface specification reduces the need
for premature, architecture-specific optimisation. It improves program portability, supports intercomponent
optimisation and enables generation of efficient data movement code.
We offer the following contributions: an introduction to the concept of indexed dependence metadata
as a generalisation of stream programming, a demonstration of its advantages in a component
programming system, the decoupled access/execute model for C++ programs, and how indexed dependence
metadata might be used to improve the programming model for GPU-based designs. Our
experimental results with prototype implementations show that indexed dependence metadata supports
automatic synthesis of double-buffered data movement for the Cell processor and enables aggressive
loop fusion optimisations in image processing, linear algebra and multigrid application case
studies
Restricted Strip Covering and the Sensor Cover Problem
Given a set of objects with durations (jobs) that cover a base region, can we
schedule the jobs to maximize the duration the original region remains covered?
We call this problem the sensor cover problem. This problem arises in the
context of covering a region with sensors. For example, suppose you wish to
monitor activity along a fence by sensors placed at various fixed locations.
Each sensor has a range and limited battery life. The problem is to schedule
when to turn on the sensors so that the fence is fully monitored for as long as
possible. This one dimensional problem involves intervals on the real line.
Associating a duration to each yields a set of rectangles in space and time,
each specified by a pair of fixed horizontal endpoints and a height. The
objective is to assign a position to each rectangle to maximize the height at
which the spanning interval is fully covered. We call this one dimensional
problem restricted strip covering. If we replace the covering constraint by a
packing constraint, the problem is identical to dynamic storage allocation, a
scheduling problem that is a restricted case of the strip packing problem. We
show that the restricted strip covering problem is NP-hard and present an O(log
log n)-approximation algorithm. We present better approximations or exact
algorithms for some special cases. For the uniform-duration case of restricted
strip covering we give a polynomial-time, exact algorithm but prove that the
uniform-duration case for higher-dimensional regions is NP-hard. Finally, we
consider regions that are arbitrary sets, and we present an O(log
n)-approximation algorithm.Comment: 14 pages, 6 figure
Nested-Loops Tiling for Parallelization and Locality Optimization
Data locality improvement and nested loops parallelization are two complementary and competing approaches for optimizing loop nests that constitute a large portion of computation times in scientific and engineering programs. While there are effective methods for each one of these, prior studies have paid less attention to address these two simultaneously. This paper proposes a unified approach that integrates these two techniques to obtain an appropriate locality conscious loop transformation to partition the loop iteration space into outer parallel tiled loops. The approach is based on the polyhedral model to achieve a multidimensional affine scheduling as a transformation that result the largest groups of tilable loops with maximum coarse grain parallelism, as far as possible. Furthermore, tiles will be scheduled on processor cores to exploit maximum data reuse through scheduling tiles with high volume of data sharing on the same core consecutively or on different cores with shared cache at around the same time
Mapping constrained optimization problems to quantum annealing with application to fault diagnosis
Current quantum annealing (QA) hardware suffers from practical limitations
such as finite temperature, sparse connectivity, small qubit numbers, and
control error. We propose new algorithms for mapping boolean constraint
satisfaction problems (CSPs) onto QA hardware mitigating these limitations. In
particular we develop a new embedding algorithm for mapping a CSP onto a
hardware Ising model with a fixed sparse set of interactions, and propose two
new decomposition algorithms for solving problems too large to map directly
into hardware.
The mapping technique is locally-structured, as hardware compatible Ising
models are generated for each problem constraint, and variables appearing in
different constraints are chained together using ferromagnetic couplings. In
contrast, global embedding techniques generate a hardware independent Ising
model for all the constraints, and then use a minor-embedding algorithm to
generate a hardware compatible Ising model. We give an example of a class of
CSPs for which the scaling performance of D-Wave's QA hardware using the local
mapping technique is significantly better than global embedding.
We validate the approach by applying D-Wave's hardware to circuit-based
fault-diagnosis. For circuits that embed directly, we find that the hardware is
typically able to find all solutions from a min-fault diagnosis set of size N
using 1000N samples, using an annealing rate that is 25 times faster than a
leading SAT-based sampling method. Further, we apply decomposition algorithms
to find min-cardinality faults for circuits that are up to 5 times larger than
can be solved directly on current hardware.Comment: 22 pages, 4 figure
09061 Abstracts Collection -- Combinatorial Scientific Computing
From 01.02.2009 to 06.02.2009, the Dagstuhl Seminar 09061 ``Combinatorial Scientific Computing \u27\u27 was held in Schloss Dagstuhl -- Leibniz Center for Informatics.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available
Guarding and Searching Polyhedra
Guarding and searching problems have been of fundamental interest since the early years of Computational Geometry. Both are well-developed areas of research and have been thoroughly studied in planar polygonal settings.
In this thesis we tackle the Art Gallery Problem and the Searchlight Scheduling Problem in 3-dimensional polyhedral environments, putting special emphasis on edge guards and orthogonal polyhedra.
We solve the Art Gallery Problem with reflex edge guards in orthogonal polyhedra having reflex edges in just two directions: generalizing a classic theorem by O'Rourke, we prove that r/2 + 1 reflex edge guards are sufficient and occasionally necessary, where r is the number of reflex edges. We also show how to compute guard locations in O(n log n) time.
Then we investigate the Art Gallery Problem with mutually parallel edge guards in orthogonal polyhedra with e edges, showing that 11e/72 edge guards are always sufficient and can be found in linear time, improving upon the previous state of the art, which was e/6. We also give tight inequalities relating e with the number of reflex edges r, obtaining an upper bound on the guard number of 7r/12 + 1.
We further study the Art Gallery Problem with edge guards in polyhedra having faces oriented in just four directions, obtaining a lower bound of e/6 - 1 edge guards and an upper bound of (e+r)/6 edge guards.
All the previously mentioned results hold for polyhedra of any genus. Additionally, several guard types and guarding modes are discussed, namely open and closed edge guards, and orthogonal and non-orthogonal guarding.
Next, we model the Searchlight Scheduling Problem, the problem of searching a given polyhedron by suitably turning some half-planes
around their axes, in order to catch an evasive intruder. After discussing several generalizations of classic theorems, we study the problem of efficiently placing guards in a given polyhedron, in order to make it searchable. For general polyhedra, we give an upper bound of r^2 on the number of guards, which reduces to r for orthogonal polyhedra.
Then we prove that it is strongly NP-hard to decide if a given polyhedron is entirely searchable by a given set of guards. We further prove that, even under the assumption that an orthogonal polyhedron is searchable, approximating the minimum search time within a small-enough constant factor to the optimum is still strongly NP-hard.
Finally, we show that deciding if a specific region of an orthogonal polyhedron is searchable is strongly PSPACE-hard. By further improving our construction, we show that the same problem is strongly PSPACE-complete even for planar orthogonal polygons. Our last results are especially meaningful because no similar hardness theorems for 2-dimensional scenarios were previously known
Stability of Service under Time-of-Use Pricing
We consider "time-of-use" pricing as a technique for matching supply and
demand of temporal resources with the goal of maximizing social welfare.
Relevant examples include energy, computing resources on a cloud computing
platform, and charging stations for electric vehicles, among many others. A
client/job in this setting has a window of time during which he needs service,
and a particular value for obtaining it. We assume a stochastic model for
demand, where each job materializes with some probability via an independent
Bernoulli trial. Given a per-time-unit pricing of resources, any realized job
will first try to get served by the cheapest available resource in its window
and, failing that, will try to find service at the next cheapest available
resource, and so on. Thus, the natural stochastic fluctuations in demand have
the potential to lead to cascading overload events. Our main result shows that
setting prices so as to optimally handle the {\em expected} demand works well:
with high probability, when the actual demand is instantiated, the system is
stable and the expected value of the jobs served is very close to that of the
optimal offline algorithm.Comment: To appear in STOC'1
- …