Search CORE

2,475 research outputs found

Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential

Author: Elango Venmugil
Fauzia Naznin
Pouchet Louis-Noël
Ramanujam J.
Rastello Fabrice
Ravishankar Mahesh
Rountev Atanas
Sadayappan P.
Publication venue
Publication date: 01/12/2013
Field of study

Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak processing rate to memory bandwidth) as highlighted by recent studies on Exascale architectural trends. Further, flops are getting cheaper while the energy cost of data movement is increasingly dominant. The understanding and characterization of data locality properties of computations is critical in order to guide efforts to enhance data locality. Reuse distance analysis of memory address traces is a valuable tool to perform data locality characterization of programs. A single reuse distance analysis can be used to estimate the number of cache misses in a fully associative LRU cache of any size, thereby providing estimates on the minimum bandwidth requirements at different levels of the memory hierarchy to avoid being bandwidth bound. However, such an analysis only holds for the particular execution order that produced the trace. It cannot estimate potential improvement in data locality through dependence preserving transformations that change the execution schedule of the operations in the computation. In this article, we develop a novel dynamic analysis approach to characterize the inherent locality properties of a computation and thereby assess the potential for data locality enhancement via dependence preserving transformations. The execution trace of a code is analyzed to extract a computational directed acyclic graph (CDAG) of the data dependences. The CDAG is then partitioned into convex subsets, and the convex partitioning is used to reorder the operations in the execution trace to enhance data locality. The approach enables us to go beyond reuse distance analysis of a single specific order of execution of the operations of a computation in characterization of its data locality properties. It can serve a valuable role in identifying promising code regions for manual transformation, as well as assessing the effectiveness of compiler transformations for data locality enhancement. We demonstrate the effectiveness of the approach using a number of benchmarks, including case studies where the potential shown by the analysis is exploited to achieve lower data movement costs and better performance.Comment: Transaction on Architecture and Code Optimization (2014

arXiv.org e-Print Archive

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

FBLAS: Streaming Linear Algebra on FPGA

Author: De Matteis Tiziano
Hoefler Torsten
Licht Johannes de Fine
Publication venue: 'Test accounts'
Publication date: 01/01/2020
Field of study

Spatial computing architectures pose an attractive alternative to mitigate control and data movement overheads typical of load-store architectures. In practice, these devices are rarely considered in the HPC community due to the steep learning curve, low productivity and lack of available libraries for fundamental operations. High-level synthesis (HLS) tools are facilitating hardware programming, but optimizing for these architectures requires factoring in new transformations and resources/performance trade-offs. We present FBLAS, an open-source HLS implementation of BLAS for FPGAs, that enables reusability, portability and easy integration with existing software and hardware codes. FBLAS' implementation allows scaling hardware modules to exploit on-chip resources, and module interfaces are designed to natively support streaming on-chip communications, allowing them to be composed to reduce off-chip communication. With FBLAS, we set a precedent for FPGA library design, and contribute to the toolbox of customizable hardware components necessary for HPC codes to start productively targeting reconfigurable platforms

arXiv.org e-Print Archive

Repository for Publications and Research Data

Multi-Quality Auto-Tuning by Contract Negotiation

Author: Götz Sebastian
Publication venue
Publication date: 17/07/2013
Field of study

A characteristic challenge of software development is the management of omnipresent change. Classically, this constant change is driven by customers changing their requirements. The wish to optimally leverage available resources opens another source of change: the software systems environment. Software is tailored to specific platforms (e.g., hardware architectures) resulting in many variants of the same software optimized for different environments. If the environment changes, a different variant is to be used, i.e., the system has to reconfigure to the variant optimized for the arisen situation. The automation of such adjustments is subject to the research community of self-adaptive systems. The basic principle is a control loop, as known from control theory. The system (and environment) is continuously monitored, the collected data is analyzed and decisions for or against a reconfiguration are computed and realized. Central problems in this field, which are addressed in this thesis, are the management of interdependencies between non-functional properties of the system, the handling of multiple criteria subject to decision making and the scalability. In this thesis, a novel approach to self-adaptive software--Multi-Quality Auto-Tuning (MQuAT)--is presented, which provides design and operation principles for software systems which automatically provide the best possible utility to the user while producing the least possible cost. For this purpose, a component model has been developed, enabling the software developer to design and implement self-optimizing software systems in a model-driven way. This component model allows for the specification of the structure as well as the behavior of the system and is capable of covering the runtime state of the system. The notion of quality contracts is utilized to cover the non-functional behavior and, especially, the dependencies between non-functional properties of the system. At runtime the component model covers the runtime state of the system. This runtime model is used in combination with the contracts to generate optimization problems in different formalisms (Integer Linear Programming (ILP), Pseudo-Boolean Optimization (PBO), Ant Colony Optimization (ACO) and Multi-Objective Integer Linear Programming (MOILP)). Standard solvers are applied to derive solutions to these problems, which represent reconfiguration decisions, if the identified configuration differs from the current. Each approach is empirically evaluated in terms of its scalability showing the feasibility of all approaches, except for ACO, the superiority of ILP over PBO and the limits of all approaches: 100 component types for ILP, 30 for PBO, 10 for ACO and 30 for 2-objective MOILP. In presence of more than two objective functions the MOILP approach is shown to be infeasible

Technische Universität Dresden: Qucosa

Computational methods and software for the design of inertial microfluidic flow sculpting devices

Author: Stoecklein Daniel James
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2017
Field of study

The ability to sculpt inertially flowing fluid via bluff body obstacles has enormous promise for applications in bioengineering, chemistry, and manufacturing within microfluidic devices. However, the computational difficulty inherent to full scale 3-dimensional fluid flow simulations makes designing and optimizing such systems tedious, costly, and generally tasked to computational experts with access to high performance resources. The goal of this work is to construct efficient models for the design of inertial microfluidic flow sculpting devices, and implement these models in freely available, user-friendly software for the broader microfluidics community. Two software packages were developed to accomplish this: uFlow and FlowSculpt . uFlow solves the forward problem in flow sculpting, that of predicting the net deformation from an arbitrary sequence of obstacles (pillars), and includes estimations of transverse mass diffusion and particles formed by optical lithography. FlowSculpt solves the more difficult inverse problem in flow sculpting, which is to design a flow sculpting device which produces a target flow shape. Each piece of software uses efficient, experimentally validated forward models developed within this work, which are applied to deep learning techniques to explore other routes to solving the inverse problem. The models are also highly modular, capable of incorporating new microfluidic components and flow physics to the design process. It is anticipated that the microfluidics community will integrate the tools developed here into their own research, and bring new designs, components, and applications to the inertial flow sculpting platform

Digital Repository @ Iowa State University (ISU)

Machine Learning in Compiler Optimization

Author: O'Boyle Michael
Wang Zheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/05/2018
Field of study

In the last decade, machine learning based compilation has moved from an an obscure research niche to a mainstream activity. In this article, we describe the relationship between machine learning and compiler optimisation and introduce the main concepts of features, models, training and deployment. We then provide a comprehensive survey and provide a road map for the wide variety of different research areas. We conclude with a discussion on open issues in the area and potential research directions. This paper provides both an accessible introduction to the fast moving area of machine learning based compilation and a detailed bibliography of its main achievements

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Lancaster E-Prints