88 research outputs found

    Language and compiler support for stream programs

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 153-166).Stream programs represent an important class of high-performance computations. Defined by their regular processing of sequences of data, stream programs appear most commonly in the context of audio, video, and digital signal processing, though also in networking, encryption, and other areas. Stream programs can be naturally represented as a graph of independent actors that communicate explicitly over data channels. In this work we focus on programs where the input and output rates of actors are known at compile time, enabling aggressive transformations by the compiler; this model is known as synchronous dataflow. We develop a new programming language, StreamIt, that empowers both programmers and compiler writers to leverage the unique properties of the streaming domain. StreamIt offers several new abstractions, including hierarchical single-input single-output streams, composable primitives for data reordering, and a mechanism called teleport messaging that enables precise event handling in a distributed environment. We demonstrate the feasibility of developing applications in StreamIt via a detailed characterization of our 34,000-line benchmark suite, which spans from MPEG-2 encoding/decoding to GMTI radar processing. We also present a novel dynamic analysis for migrating legacy C programs into a streaming representation. The central premise of stream programming is that it enables the compiler to perform powerful optimizations. We support this premise by presenting a suite of new transformations. We describe the first translation of stream programs into the compressed domain, enabling programs written for uncompressed data formats to automatically operate directly on compressed data formats (based on LZ77). This technique offers a median speedup of 15x on common video editing operations.(cont.) We also review other optimizations developed in the StreamIt group, including automatic parallelization (offering an 11x mean speedup on the 16-core Raw machine), optimization of linear computations (offering a 5.5x average speedup on a Pentium 4), and cache-aware scheduling (offering a 3.5x mean speedup on a StrongARM 1100). While these transformations are beyond the reach of compilers for traditional languages such as C, they become tractable given the abundant parallelism and regular communication patterns exposed by the stream programming model.by William Thies.Ph.D

    A computing origami: Optimized code generation for emerging parallel platforms

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Program Analysis Based Approaches to Ensure Security and Safety of Emerging Software Platforms

    Full text link
    Our smartphones, homes, hospitals, and automobiles are being enhanced with software that provide an unprecedentedly rich set of functionalities, which has created an enormous market for the development of software that run on almost every personal computing devices in a person's daily life, including security- and safety-critical ones. However, the software development support provided by the emerging platforms also raises security risks by allowing untrusted third-party code, which can potentially be buggy, vulnerable or even malicious to control user's device. Moreover, as the Internet-of-Things (IoT) technology is gaining vast adoptions by a wide range of industries, and is penetrating every aspects of people's life, safety risks brought by the open software development support of the emerging IoT platform (e.g., smart home) could bring more severe threat to the well-being of customers than what security vulnerabilities in mobile apps have done to a cell phone user. To address this challenge posed on the software security in emerging domains, my dissertation focuses on the flaws, vulnerabilities and malice in the software developed for platforms in these domains. Specifically, we demonstrate that systematic program analyses of software (1) Lead to an understanding of design and implementation flaws across different platforms that can be leveraged in miscellaneous attacks or causing safety problems; (2) Lead to the development of security mechanisms that limit the potential for these threats.We contribute static and dynamic program analysis techniques for three modern platforms in emerging domains -- smartphone, smart home, and autonomous vehicle. Our app analysis reveals various different vulnerabilities and design flaws on these platforms, and we propose (1) static analysis tool OPAnalyzer to automates the discovery of problems by searching for vulnerable code patterns; (2) dynamic testing tool AutoFuzzer to efficiently produce and capture domain specific issues that are previously undefined; and (3) propose new access control mechanism ContexIoT to strengthen the platform's immunity to the vulnerability and malice in third-party software. Concretely, we first study a vulnerability family caused by the open ports on mobile devices, which allows remote exploitation due to insufficient protection. We devise a tool called OPAnalyzer to perform the first systematic study of open port usage and their security implications on mobile platform, which effectively identify and characterize vulnerable open port usage at scale in popular Android apps. We further identify the lack of context-based access control as a main enabler for such attacks, and begin to seek for defense solution to strengthen the system security. We study the popular smart home platform, and find the existing access control mechanisms to be coarse-grand, insufficient, and undemanding. Taking lessons from previous permission systems, we propose the ContexIoT approach, a context-based permission system for IoT platform that supports third-party app development, which protects the user from vulnerability and malice in these apps through fine-grained identification of context. Finally, we design dynamic fuzzing tool, AutoFuzzer for the testing of self-driving functionalities, which demand very high code quality using improved testing practice combining the state-of-the-art fuzzing techniques with vehicular domain knowledge, and discover problems that lead to crashes in safety-critical software on emerging autonomous vehicle platform.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145845/1/jackjia_1.pd

    Semantic-Preserving Transformations for Stream Program Orchestration on Multicore Architectures

    Get PDF
    Because the demand for high performance with big data processing and distributed computing is increasing, the stream programming paradigm has been revisited for its abundance of parallelism in virtue of independent actors that communicate via data channels. The synchronous data-flow (SDF) programming model is frequently adopted with stream programming languages for its convenience to express stream programs as a set of nodes connected by data channels. Static data-rates of SDF programming model enable program transformations that greatly improve the performance of SDF programs on multicore architectures. The major application domain is for SDF programs are digital signal processing, audio, video, graphics kernels, networking, and security. This thesis makes the following three contributions that improve the performance of SDF programs: First, a new intermediate representation (IR) called LaminarIR is introduced. LaminarIR replaces FIFO queues with direct memory accesses to reduce the data communication overhead and explicates data dependencies between producer and consumer nodes. We provide transformations and their formal semantics to convert conventional, FIFO-queue based program representations to LaminarIR. Second, a compiler framework to perform sound and semantics-preserving program transformations from FIFO semantics to LaminarIR. We employ static program analysis to resolve token positions in FIFO queues and replace them by direct memory accesses. Third, a communication-cost-aware program orchestration method to establish a foundation of LaminarIR parallelization on multicore architectures. The LaminarIR framework, which consists of the aforementioned contributions together with the benchmarks that we used with the experimental evaluation, has been open-sourced to advocate further research on improving the performance of stream programming languages

    Streamroller : A Unified Compilation and Synthesis System for Streaming Applications.

    Full text link
    The growing complexity of applications has increased the need for higher processing power. In the embedded domain, the convergence of audio, video, and networking on a handheld device has prompted the need for low cost, low power,and high performance implementations of these applications in the form of custom hardware. In a more mainstream domain like gaming consoles, the move towards more realism in physics simulations and graphics has forced the industry towards multicore systems. Many of the applications in these domains are streaming in nature. The key challenge is to get efficient implementations of custom hardware from these applications and map these applications efficiently onto multicore architectures. This dissertation presents a unified methodology, referred to as Streamroller, that can be applied for the problem of scheduling stream programs to multicore architectures and to the problem of automatic synthesis of custom hardware for stream applications. Firstly, a method called stream-graph modulo scheduling is presented, which maps stream programs effectively onto a multicore architecture. Many aspects of a real system, like limited memory and explicit DMAs are modeled in the scheduler. The scheduler is evaluated for a set of stream programs on IBM's Cell processor. Secondly, an automated high-level synthesis system for creating custom hardware for stream applications is presented. The template for the custom hardware is a pipeline of accelerators. The synthesis involves designing loop accelerators for individual kernels, instantiating buffers to store data passed between kernels, and linking these building blocks to form a pipeline. A unique aspect of this system is the use of multifunction accelerators, which improves cost by efficiently sharing hardware between multiple kernels. Finally, a method to improve the integer linear program formulations used in the schedulers that exploits symmetry in the solution space is presented. Symmetry-breaking constraints are added to the formulation, and the performance of the solver is evaluated.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61662/1/kvman_1.pd

    Approximate Computing for Energy Efficiency

    Get PDF

    Improving Programming Support for Hardware Accelerators Through Automata Processing Abstractions

    Full text link
    The adoption of hardware accelerators, such as Field-Programmable Gate Arrays, into general-purpose computation pipelines continues to rise, driven by recent trends in data collection and analysis as well as pressure from challenging physical design constraints in hardware. The architectural designs of many of these accelerators stand in stark contrast to the traditional von Neumann model of CPUs. Consequently, existing programming languages, maintenance tools, and techniques are not directly applicable to these devices, meaning that additional architectural knowledge is required for effective programming and configuration. Current programming models and techniques are akin to assembly-level programming on a CPU, thus placing significant burden on developers tasked with using these architectures. Because programming is currently performed at such low levels of abstraction, the software development process is tedious and challenging and hinders the adoption of hardware accelerators. This dissertation explores the thesis that theoretical finite automata provide a suitable abstraction for bridging the gap between high-level programming models and maintenance tools familiar to developers and the low-level hardware representations that enable high-performance execution on hardware accelerators. We adopt a principled hardware/software co-design methodology to develop a programming model providing the key properties that we observe are necessary for success, namely performance and scalability, ease of use, expressive power, and legacy support. First, we develop a framework that allows developers to port existing, legacy code to run on hardware accelerators by leveraging automata learning algorithms in a novel composition with software verification, string solvers, and high-performance automata architectures. Next, we design a domain-specific programming language to aid programmers writing pattern-searching algorithms and develop compilation algorithms to produce finite automata, which supports efficient execution on a wide variety of processing architectures. Then, we develop an interactive debugger for our new language, which allows developers to accurately identify the locations of bugs in software while maintaining support for high-throughput data processing. Finally, we develop two new automata-derived accelerator architectures to support additional applications, including the detection of security attacks and the parsing of recursive and tree-structured data. Using empirical studies, logical reasoning, and statistical analyses, we demonstrate that our prototype artifacts scale to real-world applications, maintain manageable overheads, and support developers' use of hardware accelerators. Collectively, the research efforts detailed in this dissertation help ease the adoption and use of hardware accelerators for data analysis applications, while supporting high-performance computation.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155224/1/angstadt_1.pd
    corecore