Search CORE

9,257 research outputs found

Polly's Polyhedral Scheduling in the Presence of Reductions

Author: Benaissa Zino
Doerfert Johannes
Hack Sebastian
Streit Kevin
Publication venue
Publication date: 01/01/2015
Field of study

The polyhedral model provides a powerful mathematical abstraction to enable effective optimization of loop nests with respect to a given optimization goal, e.g., exploiting parallelism. Unexploited reduction properties are a frequent reason for polyhedral optimizers to assume parallelism prohibiting dependences. To our knowledge, no polyhedral loop optimizer available in any production compiler provides support for reductions. In this paper, we show that leveraging the parallelism of reductions can lead to a significant performance increase. We give a precise, dependence based, definition of reductions and discuss ways to extend polyhedral optimization to exploit the associativity and commutativity of reduction computations. We have implemented a reduction-enabled scheduling approach in the Polly polyhedral optimizer and evaluate it on the standard Polybench 3.2 benchmark suite. We were able to detect and model all 52 arithmetic reductions and achieve speedups up to 2.21

\times

on a quad core machine by exploiting the multidimensional reduction in the BiCG benchmark.Comment: Presented at the IMPACT15 worksho

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Tupleware: Redefining Modern Analytics

Author: Cetintemel Ugur
Crotty Andrew
Dursun Kayhan
Galakatos Alex
Kraska Tim
Zdonik Stan
Publication venue
Publication date: 30/07/2014
Field of study

There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the data and infrastructure of the Googles and Facebooks of the world---petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few dozen nodes, analyze relatively small datasets of up to a few terabytes, and perform primarily compute-intensive operations. Targeting these users fundamentally changes the way we should build analytics systems. This paper describes the design of Tupleware, a new system specifically aimed at the challenges faced by the typical user. Tupleware's architecture brings together ideas from the database, compiler, and programming languages communities to create a powerful end-to-end solution for data analysis. We propose novel techniques that consider the data, computations, and hardware together to achieve maximum performance on a case-by-case basis. Our experimental evaluation quantifies the impact of our novel techniques and shows orders of magnitude performance improvement over alternative systems

arXiv.org e-Print Archive

CiteSeerX

Implementing O(N) N-Body Algorithms Efficiently in Data-Parallel Languages

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/1996
Field of study

Crossref

Decomposition, Reformulation, and Diving in University Course Timetabling

Author: Abdullah
Abdullah
Ahuja
Al-Yakoob
Andrew J. Parkes
Avella
Avella
Bardadym
Barnhart
Beyrouthy
Bixby
Blum
Bodlaender
Burke
Burke
Burke
Burke
Burke
Calégari
Carter
Carter
Chabrier
Danna
Daskalaki
Daskalaki
DiGaspero
Dimopoulou
Edmund K. Burke
El-Abd
Even
Fischetti
Fischetti
Gandibleux
Garey
Hana Rudová
Jakub Mareček
Johnson
Lach
Lawrie
Leung
Lin
Mirhassani
Nemhauser
Petrovic
Pisinger
Puchinger
Qualizza
Raidl
Ralphs
Ralphs
Røpke
Schaerf
Schimmelpfeng
Schrimpf
Tenenbaum
Trick
Tripathy
Welsh
Williams
Zilberstein
Publication venue: 'Elsevier BV'
Publication date: 20/03/2009
Field of study

In many real-life optimisation problems, there are multiple interacting components in a solution. For example, different components might specify assignments to different kinds of resource. Often, each component is associated with different sets of soft constraints, and so with different measures of soft constraint violation. The goal is then to minimise a linear combination of such measures. This paper studies an approach to such problems, which can be thought of as multiphase exploitation of multiple objective-/value-restricted submodels. In this approach, only one computationally difficult component of a problem and the associated subset of objectives is considered at first. This produces partial solutions, which define interesting neighbourhoods in the search space of the complete problem. Often, it is possible to pick the initial component so that variable aggregation can be performed at the first stage, and the neighbourhoods to be explored next are guaranteed to contain feasible solutions. Using integer programming, it is then easy to implement heuristics producing solutions with bounds on their quality. Our study is performed on a university course timetabling problem used in the 2007 International Timetabling Competition, also known as the Udine Course Timetabling Problem. In the proposed heuristic, an objective-restricted neighbourhood generator produces assignments of periods to events, with decreasing numbers of violations of two period-related soft constraints. Those are relaxed into assignments of events to days, which define neighbourhoods that are easier to search with respect to all four soft constraints. Integer programming formulations for all subproblems are given and evaluated using ILOG CPLEX 11. The wider applicability of this approach is analysed and discussed.Comment: 45 pages, 7 figures. Improved typesetting of figures and table

arXiv.org e-Print Archive

CiteSeerX

Crossref

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

Optimal Compilation of HPF Remappings

Author: Ancourt Corinne
Coelho Fabien
Publication venue: 'Elsevier BV'
Publication date: 01/10/1996
Field of study

International audienceApplications with varying array access patterns require to dynamically change array mappings on distributed-memory parallel machines. HPF (High Performance Fortran) provides such remappings, on data that can be replicated, explicitly through therealign andredistribute directives and implicitly at procedure calls and returns. However such features are left out of the HPF subset or of the currently discussed hpf kernel for effeciency reasons. This paper presents a new compilation technique to handle hpf remappings for message-passing parallel architectures. The first phase is global and removes all useless remappings that appear naturally in procedures. The code generated by the second phase takes advantage of replications to shorten the remapping time. It is proved optimal: A minimal number of messages, containing only the required data, is sent over the network. The technique is fully implemented in HPFC, our prototype HPF compiler. Experiments were performed on a Dec Alpha farm

HAL-MINES ParisTech