1,284 research outputs found

    Indexed dependence metadata and its applications in software performance optimisation

    No full text
    To achieve continued performance improvements, modern microprocessor design is tending to concentrate an increasing proportion of hardware on computation units with less automatic management of data movement and extraction of parallelism. As a result, architectures increasingly include multiple computation cores and complicated, software-managed memory hierarchies. Compilers have difficulty characterizing the behaviour of a kernel in a general enough manner to enable automatic generation of efficient code in any but the most straightforward of cases. We propose the concept of indexed dependence metadata to improve application development and mapping onto such architectures. The metadata represent both the iteration space of a kernel and the mapping of that iteration space from a given index to the set of data elements that iteration might use: thus the dependence metadata is indexed by the kernel’s iteration space. This explicit mapping allows the compiler or runtime to optimise the program more efficiently, and improves the program structure for the developer. We argue that this form of explicit interface specification reduces the need for premature, architecture-specific optimisation. It improves program portability, supports intercomponent optimisation and enables generation of efficient data movement code. We offer the following contributions: an introduction to the concept of indexed dependence metadata as a generalisation of stream programming, a demonstration of its advantages in a component programming system, the decoupled access/execute model for C++ programs, and how indexed dependence metadata might be used to improve the programming model for GPU-based designs. Our experimental results with prototype implementations show that indexed dependence metadata supports automatic synthesis of double-buffered data movement for the Cell processor and enables aggressive loop fusion optimisations in image processing, linear algebra and multigrid application case studies

    Restricted Strip Covering and the Sensor Cover Problem

    Full text link
    Given a set of objects with durations (jobs) that cover a base region, can we schedule the jobs to maximize the duration the original region remains covered? We call this problem the sensor cover problem. This problem arises in the context of covering a region with sensors. For example, suppose you wish to monitor activity along a fence by sensors placed at various fixed locations. Each sensor has a range and limited battery life. The problem is to schedule when to turn on the sensors so that the fence is fully monitored for as long as possible. This one dimensional problem involves intervals on the real line. Associating a duration to each yields a set of rectangles in space and time, each specified by a pair of fixed horizontal endpoints and a height. The objective is to assign a position to each rectangle to maximize the height at which the spanning interval is fully covered. We call this one dimensional problem restricted strip covering. If we replace the covering constraint by a packing constraint, the problem is identical to dynamic storage allocation, a scheduling problem that is a restricted case of the strip packing problem. We show that the restricted strip covering problem is NP-hard and present an O(log log n)-approximation algorithm. We present better approximations or exact algorithms for some special cases. For the uniform-duration case of restricted strip covering we give a polynomial-time, exact algorithm but prove that the uniform-duration case for higher-dimensional regions is NP-hard. Finally, we consider regions that are arbitrary sets, and we present an O(log n)-approximation algorithm.Comment: 14 pages, 6 figure

    Parallel local search

    Get PDF

    Nested-Loops Tiling for Parallelization and Locality Optimization

    Get PDF
    Data locality improvement and nested loops parallelization are two complementary and competing approaches for optimizing loop nests that constitute a large portion of computation times in scientific and engineering programs. While there are effective methods for each one of these, prior studies have paid less attention to address these two simultaneously. This paper proposes a unified approach that integrates these two techniques to obtain an appropriate locality conscious loop transformation to partition the loop iteration space into outer parallel tiled loops. The approach is based on the polyhedral model to achieve a multidimensional affine scheduling as a transformation that result the largest groups of tilable loops with maximum coarse grain parallelism, as far as possible. Furthermore, tiles will be scheduled on processor cores to exploit maximum data reuse through scheduling tiles with high volume of data sharing on the same core consecutively or on different cores with shared cache at around the same time

    Mapping constrained optimization problems to quantum annealing with application to fault diagnosis

    Get PDF
    Current quantum annealing (QA) hardware suffers from practical limitations such as finite temperature, sparse connectivity, small qubit numbers, and control error. We propose new algorithms for mapping boolean constraint satisfaction problems (CSPs) onto QA hardware mitigating these limitations. In particular we develop a new embedding algorithm for mapping a CSP onto a hardware Ising model with a fixed sparse set of interactions, and propose two new decomposition algorithms for solving problems too large to map directly into hardware. The mapping technique is locally-structured, as hardware compatible Ising models are generated for each problem constraint, and variables appearing in different constraints are chained together using ferromagnetic couplings. In contrast, global embedding techniques generate a hardware independent Ising model for all the constraints, and then use a minor-embedding algorithm to generate a hardware compatible Ising model. We give an example of a class of CSPs for which the scaling performance of D-Wave's QA hardware using the local mapping technique is significantly better than global embedding. We validate the approach by applying D-Wave's hardware to circuit-based fault-diagnosis. For circuits that embed directly, we find that the hardware is typically able to find all solutions from a min-fault diagnosis set of size N using 1000N samples, using an annealing rate that is 25 times faster than a leading SAT-based sampling method. Further, we apply decomposition algorithms to find min-cardinality faults for circuits that are up to 5 times larger than can be solved directly on current hardware.Comment: 22 pages, 4 figure

    09061 Abstracts Collection -- Combinatorial Scientific Computing

    Get PDF
    From 01.02.2009 to 06.02.2009, the Dagstuhl Seminar 09061 ``Combinatorial Scientific Computing \u27\u27 was held in Schloss Dagstuhl -- Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    Guarding and Searching Polyhedra

    Get PDF
    Guarding and searching problems have been of fundamental interest since the early years of Computational Geometry. Both are well-developed areas of research and have been thoroughly studied in planar polygonal settings. In this thesis we tackle the Art Gallery Problem and the Searchlight Scheduling Problem in 3-dimensional polyhedral environments, putting special emphasis on edge guards and orthogonal polyhedra. We solve the Art Gallery Problem with reflex edge guards in orthogonal polyhedra having reflex edges in just two directions: generalizing a classic theorem by O'Rourke, we prove that r/2 + 1 reflex edge guards are sufficient and occasionally necessary, where r is the number of reflex edges. We also show how to compute guard locations in O(n log n) time. Then we investigate the Art Gallery Problem with mutually parallel edge guards in orthogonal polyhedra with e edges, showing that 11e/72 edge guards are always sufficient and can be found in linear time, improving upon the previous state of the art, which was e/6. We also give tight inequalities relating e with the number of reflex edges r, obtaining an upper bound on the guard number of 7r/12 + 1. We further study the Art Gallery Problem with edge guards in polyhedra having faces oriented in just four directions, obtaining a lower bound of e/6 - 1 edge guards and an upper bound of (e+r)/6 edge guards. All the previously mentioned results hold for polyhedra of any genus. Additionally, several guard types and guarding modes are discussed, namely open and closed edge guards, and orthogonal and non-orthogonal guarding. Next, we model the Searchlight Scheduling Problem, the problem of searching a given polyhedron by suitably turning some half-planes around their axes, in order to catch an evasive intruder. After discussing several generalizations of classic theorems, we study the problem of efficiently placing guards in a given polyhedron, in order to make it searchable. For general polyhedra, we give an upper bound of r^2 on the number of guards, which reduces to r for orthogonal polyhedra. Then we prove that it is strongly NP-hard to decide if a given polyhedron is entirely searchable by a given set of guards. We further prove that, even under the assumption that an orthogonal polyhedron is searchable, approximating the minimum search time within a small-enough constant factor to the optimum is still strongly NP-hard. Finally, we show that deciding if a specific region of an orthogonal polyhedron is searchable is strongly PSPACE-hard. By further improving our construction, we show that the same problem is strongly PSPACE-complete even for planar orthogonal polygons. Our last results are especially meaningful because no similar hardness theorems for 2-dimensional scenarios were previously known

    Stability of Service under Time-of-Use Pricing

    Full text link
    We consider "time-of-use" pricing as a technique for matching supply and demand of temporal resources with the goal of maximizing social welfare. Relevant examples include energy, computing resources on a cloud computing platform, and charging stations for electric vehicles, among many others. A client/job in this setting has a window of time during which he needs service, and a particular value for obtaining it. We assume a stochastic model for demand, where each job materializes with some probability via an independent Bernoulli trial. Given a per-time-unit pricing of resources, any realized job will first try to get served by the cheapest available resource in its window and, failing that, will try to find service at the next cheapest available resource, and so on. Thus, the natural stochastic fluctuations in demand have the potential to lead to cascading overload events. Our main result shows that setting prices so as to optimally handle the {\em expected} demand works well: with high probability, when the actual demand is instantiated, the system is stable and the expected value of the jobs served is very close to that of the optimal offline algorithm.Comment: To appear in STOC'1
    • …
    corecore