Search CORE

36,044 research outputs found

Automated dynamic memory data type implementation exploration and optimization

Author: Atienza David
De Florio Vincenzo
Deconinck Geert
Leeman Marc
Ykman Chantal
Publication venue: New York, IEEE Computer Society
Publication date: 11/01/2009
Field of study

The behavior of many algorithms is heavily determined by the input data. Furthermore, this often means that multiple and completely different execution paths can be followed, also internal data usage and handling is frequently quite different. Therefore, static compile time memory allocation is not efficient, especially on embedded systems where memory is a scarce resource, and dynamic memory management is the only feasible alternative. Including applications with dynamic memory in embedded systems introduces new challenges as compared to traditional signal processing applications. In this session, an automated framework is presented to optimize embedded applications with extensive use of dynamic memory management. The proposed methodology automates the exploration and identification of optimal data type implementations based on power estimates, memory accesses and normalized memory usage

Infoscience - École polytechnique fédérale de Lausanne

Optimization of Dynamic Data Structures in Multimedia Embedded Systems Using Evolutionary Computation

Author: Atienza David
Baloukas Christos
Catthoor Francky
Hidalgo Jose I.
Lanchares Juan
Mamagkakis Stylianos
Papadopoulos Lazaros
Poucet Christophe
Soudris Dimitrios
Publication venue: Amsterdam, ACM
Publication date: 10/01/2009
Field of study

Embedded consumer devices are increasing their capabilities and can now implement new multimedia applications reserved only for powerful desktops a few years ago. These applications share complex and intensive dynamic memory use. Thus, dynamic memory optimizations are a requirement when porting these applications. Within these optimizations, the refinement of the Dynamically (de)allocated Data Type (or DDT) implementations is one of the most important and difficult parts for an efficient mapping onto low-power embedded devices. In this paper, we describe a new automatic optimization approach for the DDTs of object-oriented multimedia applications. It is based on an analytical pre-characterization of the possible elementary DDT blocks, and a multi-objective genetic algorithm to explore the design space and to select the best implementation according to different optimization criteria (i.e., memory accesses, memory footprint and energy consumption). Our results in real-life multimedia applications show that the best implementations of DDTs can be obtained in an automated way in few hours, while typically designers would require days to find a suitable implementation, achieving important savings in exploration time with respect to other state-of-the-art heuristics-based optimization methods for this task

Infoscience - École polytechnique fédérale de Lausanne

A Survey of Symbolic Execution Techniques

Author: Baldoni Roberto
Coppa Emilio
D'Elia Daniele Cono
Demetrescu Camil
Finocchi Irene
Publication venue
Publication date: 01/01/2018
Field of study

Many security and software testing applications require checking whether certain properties of a program hold for any possible usage scenario. For instance, a tool for identifying software vulnerabilities may need to rule out the existence of any backdoor to bypass a program's authentication. One approach would be to test the program using different, possibly random inputs. As the backdoor may only be hit for very specific program workloads, automated exploration of the space of possible inputs is of the essence. Symbolic execution provides an elegant solution to the problem, by systematically exploring many possible execution paths at the same time without necessarily requiring concrete inputs. Rather than taking on fully specified input values, the technique abstractly represents them as symbols, resorting to constraint solvers to construct actual instances that would cause property violations. Symbolic execution has been incubated in dozens of tools developed over the last four decades, leading to major practical breakthroughs in a number of prominent software reliability applications. The goal of this survey is to provide an overview of the main ideas, challenges, and solutions developed in the area, distilling them for a broad audience. The present survey has been accepted for publication at ACM Computing Surveys. If you are considering citing this survey, we would appreciate if you could use the following BibTeX entry: http://goo.gl/Hf5FvcComment: This is the authors pre-print copy. If you are considering citing this survey, we would appreciate if you could use the following BibTeX entry: http://goo.gl/Hf5Fv

arXiv.org e-Print Archive

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Archivio della ricerca- Università di Roma La Sapienza

AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

Author: Cong Jason
Wei Peng
Yu Cody Hao
Zhang Peng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/07/2018
Field of study

CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels

arXiv.org e-Print Archive

Crossref

Scipedia

An empirical investigation into branch coverage for C programs using CUTE and AUSTIN

Author: Baresel
Baresel
Baudry
Botella
Bottaci
Buehler
Burnim
Chakrabarti
Clark
Cohen
Csallner
Ferguson
Godefroid
Godefroid
Godefroid
Harman
Harman
Harman
Harman
Harman
Inkumsah
King
Kiran Lakhotia
Korel
Lakhotia
Majumdar
Mark Harman
McMinn
McMinn
Michael
Miller
Necula
Pargas
Phil McMinn
Sen
Sen
Tillmann
Tonella
Walcott
Wappler
Wegener
Wegener
Williams
Xie
Yoo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

Automated test data generation has remained a topic of considerable interest for several decades because it lies at the heart of attempts to automate the process of Software Testing. This paper reports the results of an empirical study using the dynamic symbolic-execution tool. CUTE, and a search based tool, AUSTIN on five non-trivial open source applications. The aim is to provide practitioners with an assessment of what can be achieved by existing techniques with little or no specialist knowledge and to provide researchers with baseline data against which to measure subsequent work. To achieve this, each tool is applied 'as is', with neither additional tuning nor supporting harnesses and with no adjustments applied to the subject programs under test. The mere fact that these tools can be applied 'out of the box' in this manner reflects the growing maturity of Automated test data generation. However, as might be expected, the study reveals opportunities for improvement and suggests ways to hybridize these two approaches that have hitherto been developed entirely independently. (C) 2010 Elsevier Inc. All rights reserved

CiteSeerX

Crossref

UCL Discovery

King's Research Portal

White Rose Research Online

PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

Author: Ahmed Fasih
Andreas Klöckner
Bell
Bryan Catanzaro
Buck
Chandler
Dalcín
Eich
Feldman
Flanagan
Frigo
Group
Hestenes
Hesthaven
Kennedy
Klöckner
Lam
Langtangen
Lindholm
McCarthy
McCool
Nicolas Pinto
Oliphant
Owens
Paul Ivanov
Pinto
Pinto
Prud’homme
Reynders
Seiler
Stein
Valiant
van Hateren
Veldhuizen
Wang
Whaley
Yunsup Lee
Publication venue: 'Elsevier BV'
Publication date: 29/03/2011
Field of study

High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

arXiv.org e-Print Archive

Crossref