9,257 research outputs found
Polly's Polyhedral Scheduling in the Presence of Reductions
The polyhedral model provides a powerful mathematical abstraction to enable
effective optimization of loop nests with respect to a given optimization goal,
e.g., exploiting parallelism. Unexploited reduction properties are a frequent
reason for polyhedral optimizers to assume parallelism prohibiting dependences.
To our knowledge, no polyhedral loop optimizer available in any production
compiler provides support for reductions. In this paper, we show that
leveraging the parallelism of reductions can lead to a significant performance
increase. We give a precise, dependence based, definition of reductions and
discuss ways to extend polyhedral optimization to exploit the associativity and
commutativity of reduction computations. We have implemented a
reduction-enabled scheduling approach in the Polly polyhedral optimizer and
evaluate it on the standard Polybench 3.2 benchmark suite. We were able to
detect and model all 52 arithmetic reductions and achieve speedups up to
2.21 on a quad core machine by exploiting the multidimensional
reduction in the BiCG benchmark.Comment: Presented at the IMPACT15 worksho
Tupleware: Redefining Modern Analytics
There is a fundamental discrepancy between the targeted and actual users of
current analytics frameworks. Most systems are designed for the data and
infrastructure of the Googles and Facebooks of the world---petabytes of data
distributed across large cloud deployments consisting of thousands of cheap
commodity machines. Yet, the vast majority of users operate clusters ranging
from a few to a few dozen nodes, analyze relatively small datasets of up to a
few terabytes, and perform primarily compute-intensive operations. Targeting
these users fundamentally changes the way we should build analytics systems.
This paper describes the design of Tupleware, a new system specifically aimed
at the challenges faced by the typical user. Tupleware's architecture brings
together ideas from the database, compiler, and programming languages
communities to create a powerful end-to-end solution for data analysis. We
propose novel techniques that consider the data, computations, and hardware
together to achieve maximum performance on a case-by-case basis. Our
experimental evaluation quantifies the impact of our novel techniques and shows
orders of magnitude performance improvement over alternative systems
Decomposition, Reformulation, and Diving in University Course Timetabling
In many real-life optimisation problems, there are multiple interacting
components in a solution. For example, different components might specify
assignments to different kinds of resource. Often, each component is associated
with different sets of soft constraints, and so with different measures of soft
constraint violation. The goal is then to minimise a linear combination of such
measures. This paper studies an approach to such problems, which can be thought
of as multiphase exploitation of multiple objective-/value-restricted
submodels. In this approach, only one computationally difficult component of a
problem and the associated subset of objectives is considered at first. This
produces partial solutions, which define interesting neighbourhoods in the
search space of the complete problem. Often, it is possible to pick the initial
component so that variable aggregation can be performed at the first stage, and
the neighbourhoods to be explored next are guaranteed to contain feasible
solutions. Using integer programming, it is then easy to implement heuristics
producing solutions with bounds on their quality.
Our study is performed on a university course timetabling problem used in the
2007 International Timetabling Competition, also known as the Udine Course
Timetabling Problem. In the proposed heuristic, an objective-restricted
neighbourhood generator produces assignments of periods to events, with
decreasing numbers of violations of two period-related soft constraints. Those
are relaxed into assignments of events to days, which define neighbourhoods
that are easier to search with respect to all four soft constraints. Integer
programming formulations for all subproblems are given and evaluated using ILOG
CPLEX 11. The wider applicability of this approach is analysed and discussed.Comment: 45 pages, 7 figures. Improved typesetting of figures and table
Optimal Compilation of HPF Remappings
International audienceApplications with varying array access patterns require to dynamically change array mappings on distributed-memory parallel machines. HPF (High Performance Fortran) provides such remappings, on data that can be replicated, explicitly through therealign andredistribute directives and implicitly at procedure calls and returns. However such features are left out of the HPF subset or of the currently discussed hpf kernel for effeciency reasons. This paper presents a new compilation technique to handle hpf remappings for message-passing parallel architectures. The first phase is global and removes all useless remappings that appear naturally in procedures. The code generated by the second phase takes advantage of replications to shorten the remapping time. It is proved optimal: A minimal number of messages, containing only the required data, is sent over the network. The technique is fully implemented in HPFC, our prototype HPF compiler. Experiments were performed on a Dec Alpha farm
- …