47 research outputs found

    Offline compression for on-chip RAM

    Get PDF
    ManuscriptWe present offline RAM compression, an automated source-to-source transformation that reduces a program's data size. Statically allocated scalars, pointers, structures, and arrays are encoded and packed based on the results of a whole-program analysis in the value set and pointer set domains. We target embedded software written in C that relies heavily on static memory allocation and runs on Harvard-architecture microcontrollers supporting just a few KB of on-chip RAM. On a collection of embedded applications for AVR microcontrollers, our transformation reduces RAM usage by an average of 12%, in addition to a 10% reduction through a dead-data elimination pass that is also driven by our whole-program analysis, for a total RAM savings of 22%. We also developed a technique for giving developers access to a flexible spectrum of tradeoffs between RAM consumption, ROM consumption, and CPU efficiency. This technique is based on a model for estimating the cost/benefit ratio of compressing each variable and then selectively compressing only those variables that present a good value proposition in terms of the desired tradeoffs

    Design and optimisation of scientific programs in a categorical language

    Get PDF
    This thesis presents an investigation into the use of advanced computer languages for scientific computing, an examination of performance issues that arise from using such languages for such a task, and a step toward achieving portable performance from compilers by attacking these problems in a way that compensates for the complexity of and differences between modern computer architectures. The language employed is Aldor, a functional language from computer algebra, and the scientific computing area is a subset of the family of iterative linear equation solvers applied to sparse systems. The linear equation solvers that are considered have much common structure, and this is factored out and represented explicitly in the lan-guage as a framework, by means of categories and domains. The flexibility introduced by decomposing the algorithms and the objects they act on into separate modules has a strong performance impact due to its negative effect on temporal locality. This necessi-tates breaking the barriers between modules to perform cross-component optimisation. In this instance the task reduces to one of collective loop fusion and array contrac

    Language and compiler support for dyanmic code generation

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 131-135).by Massimiliano A. Poletto.Ph.D

    View-based abstraction : enhancing maintainability and modularity in the presence of implementation dependencies

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (p. 173-177).by Luis H. Rodriguez, Jr.Ph.D

    Autotuning for Automatic Parallelization on Heterogeneous Systems

    Get PDF

    Helium: a transparent inter-kernel optimizer for OpenCL

    Get PDF

    Software Model Checking with Uninterpreted Functions

    Full text link
    Software model checkers attempt to algorithmically synthesize an inductive proof that a piece of software is safe. Such proofs are composed of complex logical assertions about program variables and control structures, and are computationally expensive to produce. Our unifying motivation is to increase the efficiency of verifying software control behavior despite its dependency on data. Control properties include important topics such as mutual exclusion, safe privilege elevation, and proper usage of networking and other APIs. These concerns motivate our techniques and evaluations. Our approach integrates an efficient abstraction procedure based on the logic of equality with uninterpreted functions (EUF) into the core of a modern model checker. Our checker, called euforia, targets control properties by treating a program's data operations and relations as uninterpreted functions and predicates, respectively. This reduces the cost of building inductive proofs, especially for verifying control relationships in the presence of complex but irrelevant data processing. We show that our method is sound and terminates. We provide a ground-up implementation and evaluate the abstraction on a variety of software verification benchmarks. We show how to extend this abstraction to memory-manipulating programs. By judicious abstraction of array operations to EUF, we show that we can directly reason about array reads and adaptively learn lemmas about array writes leading to significant performance improvements over existing approaches. We show that our abstraction of array operations completely eliminates much of the array theory reasoning otherwise required. We report on experiments with and without abstraction and compare our checker to the state of the art. Programs with procedures pose unique difficulties and opportunities. We show how to retrofit a model checker not supporting procedures so that it supports modular analysis of programs with non-recursive procedures. This technique applies to euforia as well as other logic-based algorithms. We show that this technique enables logical assertions about procedure bodies to be reused at different call sites. We report on experiments on software benchmarks compared to the alternative of inlining all procedures.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168092/1/dlbueno_1.pd

    Improving systems software security through program analysis and instrumentation

    Get PDF
    Security and reliability bugs are prevalent in systems software. Systems code is often written in low-level languages like C/C++, which offer many benefits but also delegate memory management and type safety to programmers. This invites bugs that cause crashes or can be exploited by attackers to take control of the program. This thesis presents techniques to detect and fix security and reliability issues in systems software without burdening the software developers. First, we present code-pointer integrity (CPI), a technique that combines static analysis with compile-time instrumentation to guarantee the integrity of all code pointers in a program and thereby prevent all control-flow hijack attacks. We also present code-pointer separation (CPS), a relaxation of CPI with better performance properties. CPI and CPS offer substantially better security-to-overhead ratios than the state of the art in control flow hijack defense mechanisms, they are practical (we protect a complete FreeBSD system and over 100 packages like apache and postgresql), effective (prevent all attacks in the RIPE benchmark), and efficient: on SPEC CPU2006, CPS averages 1.2% overhead for C and 1.9% for C/C++, while CPIâs overhead is 2.9% for C and 8.4% for C/C++. Second, we present DDT, a tool for testing closed-source device drivers to automatically find bugs like memory errors or race conditions. DDT showcases a combination of a form of program analysis called selective symbolic execution with virtualization to thoroughly exercise tested drivers and produce detailed, executable traces for every path that leads to a failure. We applied DDT to several closed-source Microsoft-certified Windows device drivers and discovered 14 serious new bugs that can cause crashes or compromise security of the entire system. Third, we present a technique for increasing the scalability of symbolic execution by merging states obtained on different execution paths. State merging reduces the number of states to analyze, but the merged states can be more complex and harder to analyze than their individual components. We introduce query count estimation, a technique to reason about the analysis time of merged states and decide which states to merge in order to achieve optimal net performance of symbolic execution. We also introduce dynamic state merging, a technique for merging states that interacts favorably with search strategies employed by practical bug finding tools, such as DDT and KLEE. Experiments on the 96 GNU Coreutils show that our approach consistently achieves several orders of magnitude speedup over previously published results

    AUTOMATING DATA-LAYOUT DECISIONS IN DOMAIN-SPECIFIC LANGUAGES

    Get PDF
    A long-standing challenge in High-Performance Computing (HPC) is the simultaneous achievement of programmer productivity and hardware computational efficiency. The challenge has been exacerbated by the onset of multi- and many-core CPUs and accelerators. Only a few expert programmers have been able to hand-code domain-specific data transformations and vectorization schemes needed to extract the best possible performance on such architectures. In this research, we examined the possibility of automating these methods by developing a Domain-Specific Language (DSL) framework. Our DSL approach extends C++14 by embedding into it a high-level data-parallel array language, and by using a domain-specific compiler to compile to hybrid-parallel code. We also implemented an array index-space transformation algebra within this high-level array language to manipulate array data-layouts and data-distributions. The compiler introduces a novel method for SIMD auto-vectorization based on array data-layouts. Our new auto-vectorization technique is shown to outperform the default auto-vectorization strategy by up to 40% for stencil computations. The compiler also automates distributed data movement with overlapping of local compute with remote data movement using polyhedral integer set analysis. Along with these main innovations, we developed a new technique using C++ template metaprogramming for developing embedded DSLs using C++. We also proposed a domain-specific compiler intermediate representation that simplifies data flow analysis of abstract DSL constructs. We evaluated our framework by constructing a DSL for the HPC grand-challenge domain of lattice quantum chromodynamics. Our DSL yielded performance gains of up to twice the flop rate over existing production C code for selected kernels. This gain in performance was obtained while using less than one-tenth the lines of code. The performance of this DSL was also competitive with the best hand-optimized and hand-vectorized code, and is an order of magnitude better than existing production DSLs.Doctor of Philosoph
    corecore