Search CORE

3,970 research outputs found

A Survey on Compiler Autotuning using Machine Learning

Author: Ashouri Amir H.
Cavazos John
Killian William
Palermo Gianluca
Silvano Cristina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/09/2018
Field of study

Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Constrained optimization in simulation: a novel approach.

Author: Kleijnen J.P.C.
van Beers W.
Van Nieuwenhuyse Inneke
Publication venue
Publication date
Field of study

This paper presents a novel heuristic for constrained optimization of random computer simulation models, in which one of the simulation outputs is selected as the objective to be minimized while the other outputs need to satisfy prespeci¯ed target values. Besides the simulation outputs, the simulation inputs must meet prespeci¯ed constraints including the constraint that the inputs be integer. The proposed heuristic combines (i) experimental design to specify the simulation input combinations, (ii) Kriging (also called spatial correlation modeling) to analyze the global simulation input/output data that result from this experimental design, and (iii) integer nonlinear programming to estimate the optimal solution from the Kriging metamodels. The heuristic is applied to an (s, S) inventory system and a realistic call-center simulation model, and compared with the popular commercial heuristic OptQuest embedded in the ARENA versions 11 and 12. These two applications show that the novel heuristic outperforms OptQuest in terms of search speed (it moves faster towards high-quality solutions) and consistency of the solution quality.

Research Papers in Economics

Constrained Optimization in Simulation: A Novel Approach

Author: Beers W.C.M. van
Kleijnen J.P.C.
Nieuwenhuyse I. van
Publication venue
Publication date
Field of study

This paper presents a novel heuristic for constrained optimization of random computer simulation models, in which one of the simulation outputs is selected as the objective to be minimized while the other outputs need to satisfy prespeci¯ed target values. Besides the simulation outputs, the simulation inputs must meet prespeci¯ed constraints including the constraint that the inputs be integer. The proposed heuristic combines (i) experimental design to specify the simulation input combinations, (ii) Kriging (also called spatial correlation mod- eling) to analyze the global simulation input/output data that result from this experimental design, and (iii) integer nonlinear programming to estimate the optimal solution from the Krig- ing metamodels. The heuristic is applied to an (s, S) inventory system and a realistic call-center simulation model, and compared with the popular commercial heuristic OptQuest embedded in the ARENA versions 11 and 12. These two applications show that the novel heuristic outper- forms OptQuest in terms of search speed (it moves faster towards high-quality solutions) and consistency of the solution quality.

Research Papers in Economics

Prosessori- ja system-on-chip-työkalujen yhteiskäyttö

Author: Lahti Sakari
Publication venue
Publication date: 04/06/2014
Field of study

Transport-triggered architecture (TTA) processors provide an efficient middle-ground in creating intellectual property (IP) components for system-on-chip (SoC) designs. Using TTAs, the design effort is greatly reduced compared to ASIC approach, and a more economic and efficient implementation is possible than when using a general purpose processor. This Thesis examines ways to accelerate the design flow when using TTA processors in SoC designs. The proposed ﬂows combine the use of the TTA-based Co-design Environment (TCE) tool set and Kactus2 IP-XACT design environment. The IP-XACT standard and the Kactus2 tool make it easy to integrate and conﬁgure IP components from multiple vendors, whereas the TCE tools provide a fast and efﬁcient path from C to VHDL. The Thesis presents three use cases for TTA: as a ready-made ﬁxed accelerator, a general purpose processor, and a tailored application-speciﬁc processor. Moreover, management of instance-speciﬁc data in IP-XACT is discussed. For each use case, the design ﬂows are presented in detail step-by-step, a case example is presented, and the design time spent on each step is evaluated. The flows contain between 15 and 18 steps and use between 8 and 12 different program tools from the studied tool sets. Provided that C source codes and IP-XACT library are available, a non-HW oriented engineer can implement an FPGA based multiprocessor product in less than 4 hours. Based on the results, further development suggestions for the TCE tools and Kactus2 are made

Trepo - Institutional Repository of Tampere University

Effective runtime resource management using linux control groups with the BarbequeRTRM framework

Author: Bellasi Patrick
Fornaciari William
Massari Giuseppe
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

The extremely high technology process reached by silicon manufacturing (smaller than 32nm) has led to production of computational platforms and SoC, featuring a considerable amount of resources. Whereas from one side such multi- and many-core platforms show growing performance capabilities, from the other side they are more and more affected by power, thermal, and reliability issues. Moreover, the increased computational capabilities allows congested usage scenarios with workloads subject to mixed and time-varying requirements. Effective usage of the resources should take into account both the application requirements and resources availability, with an arbiter, namely a resource manager in charge to solve the resource contention among demanding applications. Current operating systems (OS) have only a limited knowledge about application-specific behaviors and their time-varying requirements. Dedicated system interfaces to collect such inputs and forward them to the OS (e.g., its scheduler) are thus an interesting research area that aims at integrating the OS with an ad hoc resource manager. Such a component can exploit efficient low-level OS interfaces and mechanisms to extend its capabilities of controlling tasks and system resources. Because of the specific tasks and timings of a resource manager, this component can be easily and effectively developed as a user-space extension lying in between the OS and the controlled application. This article, which focuses on multicore Linux systems, shows a portable solution to enforce runtime resource management decisions based on the standard control groups framework. A burst and a mixed workload analysis, performed on a multicore-based NUMA platform, have reported some promising results both in terms of performance and power saving

Archivio istituzionale della ricerca - Politecnico di Milano

From MARTE to Reconfigurable NoCs: A model driven design methodology

Author: Dekeyser Jean-Luc
Elhaji Majdi
Meftali Samy
Quadri Imran Rafiq
Publication venue: 'IGI Global'
Publication date: 23/09/2010
Field of study

Due to the continuous exponential rise in SoC's design complexity, there is a critical need to find new seamless methodologies and tools to handle the SoC co-design aspects. We address this issue and propose a novel SoC co-design methodology based on Model Driven Engineering and the MARTE (Modeling and Analysis of Real-Time and Embedded Systems) standard proposed by Object Management Group, to raise the design abstraction levels. Extensions of this standard have enabled us to move from high level specifications to execution platforms such as reconfigurable FPGAs. In this paper, we present a high level modeling approach that targets modern Network on Chips systems. The overall objective: to perform system modeling at a high abstraction level expressed in Unified Modeling Language (UML); and afterwards, transform these high level models into detailed enriched lower level models in order to automatically generate the necessary code for final FPGA synthesis

HAL - Lille 3

INRIA a CCSD electronic archive server

An Interactive System Level Simulation Environment for Systems- on-Chip

Author: Apvrille Ludovic
Knorreck Daniel
Pacalet Renaud
Publication venue: HAL CCSD
Publication date: 19/05/2010
Field of study

International audienceThis article presents an interactive simulation environment for high level models intended for Design Space Exploration of Systems-On-Chip. The existing open source development environment TTool supports the MARTE compliant UML profile DIPLODOCUS and enables the designer to create, simulate and formally verify models. The goal is to obtain first performance estimations of the system intended for design while minimizing the modeling effort. The contribution outlined in this paper is an additional module providing means for controlling the simulation in real time by performing step wise execution, saving and restoring simulation states as well as animating UML models of the system. Moreover the paper elaborates on the integration of these new features into the existing framework consisting of a simulation engine on the one hand and a graphical user interface on the other hand

Many-core and heterogeneous architectures: programming models and compilation toolchains

Author
Publication venue: Politecnico di Torino
Publication date: 30/10/2020
Field of study

1noL'abstract è presente nell'allegato / the abstract is in the attachmentopen677. INGEGNERIA INFORMATInopartially_openembargoed_20211002Barchi, Francesc

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

SCALABLE TECHNIQUES FOR SCHEDULING AND MAPPING DSP APPLICATIONS ONTO EMBEDDED MULTIPROCESSOR PLATFORMS

Author: Zaki George
Publication venue
Publication date: 01/01/2013
Field of study

A variety of multiprocessor architectures has proliferated even for off-the-shelf computing platforms. To make use of these platforms, traditional implementation frameworks focus on implementing Digital Signal Processing (DSP) applications using special platform features to achieve high performance. However, due to the fast evolution of the underlying architectures, solution redevelopment is error prone and re-usability of existing solutions and libraries is limited. In this thesis, we facilitate an efficient migration of DSP systems to multiprocessor platforms while systematically leveraging previous investment in optimized library kernels using dataflow design frameworks. We make these library elements, which are typically tailored to specialized architectures, more amenable to extensive analysis and optimization using an efficient and systematic process. In this thesis we provide techniques to allow such migration through four basic contributions: 1. We propose and develop a framework to explore efficient utilization of Single Instruction Multiple Data (SIMD) cores and accelerators available in heterogeneous multiprocessor platforms consisting of General Purpose Processors (GPPs) and Graphics Processing Units (GPUs). We also propose new scheduling techniques by applying extensive block processing in conjunction with appropriate task mapping and task ordering methods that match efficiently with the underlying architecture. The approach gives the developer the ability to prototype a GPU-accelerated application and explore its design space efficiently and effectively. 2. We introduce the concept of Partial Expansion Graphs (PEGs) as an implementation model and associated class of scheduling strategies. PEGs are designed to help realize DSP systems in terms of forms and granularities of parallelism that are well matched to the given applications and targeted platforms. PEGs also facilitate derivation of both static and dynamic scheduling techniques, depending on the amount of variability in task execution times and other operating conditions. We show how to implement efficient PEG-based scheduling methods using real time operating systems, and to re-use pre-optimized libraries of DSP components within such implementations. 3. We develop new algorithms for scheduling and mapping systems implemented using PEGs. Collectively, these algorithms operate in three steps. First, the amount of data parallelism in the application graph is tuned systematically over many iterations to profit from the available cores in the target platform. Then a mapping algorithm that uses graph analysis is developed to distribute data and task parallel instances over different cores while trying to balance the load of all processing units to make use of pipeline parallelism. Finally, we use a novel technique for performance evaluation by implementing the scheduler and a customizable solution on the programmable platform. This allows accurate fitness functions to be measured and used to drive runtime adaptation of schedules. 4. In addition to providing scheduling techniques for the mentioned applications and platforms, we also show how to integrate the resulting solution in the underlying environment. This is achieved by leveraging existing libraries and applying the GPP-GPU scheduling framework to augment a popular existing Software Defined Radio (SDR) development environment -- GNU Radio -- with a dataflow foundation and a stand-alone GPU-accelerated library. We also show how to realize the PEG model on real time operating system libraries, such as the Texas Instruments DSP/BIOS. A code generator that accepts a manual system designer solution as well as automatically configured solutions is provided to complete the design flow starting from application model to running system

Digital Repository at the University of Maryland