9,257 research outputs found

    Polly's Polyhedral Scheduling in the Presence of Reductions

    Full text link
    The polyhedral model provides a powerful mathematical abstraction to enable effective optimization of loop nests with respect to a given optimization goal, e.g., exploiting parallelism. Unexploited reduction properties are a frequent reason for polyhedral optimizers to assume parallelism prohibiting dependences. To our knowledge, no polyhedral loop optimizer available in any production compiler provides support for reductions. In this paper, we show that leveraging the parallelism of reductions can lead to a significant performance increase. We give a precise, dependence based, definition of reductions and discuss ways to extend polyhedral optimization to exploit the associativity and commutativity of reduction computations. We have implemented a reduction-enabled scheduling approach in the Polly polyhedral optimizer and evaluate it on the standard Polybench 3.2 benchmark suite. We were able to detect and model all 52 arithmetic reductions and achieve speedups up to 2.21Ă—\times on a quad core machine by exploiting the multidimensional reduction in the BiCG benchmark.Comment: Presented at the IMPACT15 worksho

    Tupleware: Redefining Modern Analytics

    Full text link
    There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the data and infrastructure of the Googles and Facebooks of the world---petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few dozen nodes, analyze relatively small datasets of up to a few terabytes, and perform primarily compute-intensive operations. Targeting these users fundamentally changes the way we should build analytics systems. This paper describes the design of Tupleware, a new system specifically aimed at the challenges faced by the typical user. Tupleware's architecture brings together ideas from the database, compiler, and programming languages communities to create a powerful end-to-end solution for data analysis. We propose novel techniques that consider the data, computations, and hardware together to achieve maximum performance on a case-by-case basis. Our experimental evaluation quantifies the impact of our novel techniques and shows orders of magnitude performance improvement over alternative systems

    Implementing O(N) N-Body Algorithms Efficiently in Data-Parallel Languages

    Get PDF

    Decomposition, Reformulation, and Diving in University Course Timetabling

    Full text link
    In many real-life optimisation problems, there are multiple interacting components in a solution. For example, different components might specify assignments to different kinds of resource. Often, each component is associated with different sets of soft constraints, and so with different measures of soft constraint violation. The goal is then to minimise a linear combination of such measures. This paper studies an approach to such problems, which can be thought of as multiphase exploitation of multiple objective-/value-restricted submodels. In this approach, only one computationally difficult component of a problem and the associated subset of objectives is considered at first. This produces partial solutions, which define interesting neighbourhoods in the search space of the complete problem. Often, it is possible to pick the initial component so that variable aggregation can be performed at the first stage, and the neighbourhoods to be explored next are guaranteed to contain feasible solutions. Using integer programming, it is then easy to implement heuristics producing solutions with bounds on their quality. Our study is performed on a university course timetabling problem used in the 2007 International Timetabling Competition, also known as the Udine Course Timetabling Problem. In the proposed heuristic, an objective-restricted neighbourhood generator produces assignments of periods to events, with decreasing numbers of violations of two period-related soft constraints. Those are relaxed into assignments of events to days, which define neighbourhoods that are easier to search with respect to all four soft constraints. Integer programming formulations for all subproblems are given and evaluated using ILOG CPLEX 11. The wider applicability of this approach is analysed and discussed.Comment: 45 pages, 7 figures. Improved typesetting of figures and table

    Optimal Compilation of HPF Remappings

    No full text
    International audienceApplications with varying array access patterns require to dynamically change array mappings on distributed-memory parallel machines. HPF (High Performance Fortran) provides such remappings, on data that can be replicated, explicitly through therealign andredistribute directives and implicitly at procedure calls and returns. However such features are left out of the HPF subset or of the currently discussed hpf kernel for effeciency reasons. This paper presents a new compilation technique to handle hpf remappings for message-passing parallel architectures. The first phase is global and removes all useless remappings that appear naturally in procedures. The code generated by the second phase takes advantage of replications to shorten the remapping time. It is proved optimal: A minimal number of messages, containing only the required data, is sent over the network. The technique is fully implemented in HPFC, our prototype HPF compiler. Experiments were performed on a Dec Alpha farm
    • …
    corecore