8 research outputs found

    Design methodology for embedded computer vision systems

    Get PDF
    Computer vision has emerged as one of the most popular domains of embedded appli¬cations. Though various new powerful embedded platforms to support such applica¬tions have emerged in recent years, there is a distinct lack of efficient domain-specific synthesis techniques for optimized implementation of such systems. In this thesis, four different aspects that contribute to efficient design and synthesis of such systems are explored: (1) Graph Transformations: Dataflow modeling is widely used in digital signal processing (DSP) systems. However, support for dynamic behavior in such systems exists mainly at the modeling level and there is a lack of optimized synthesis tech¬niques for these models. New transformation techniques for efficient system-on-chip (SoC) design methods are proposed and implemented for cyclo-static dataflow and its parameterized version (parameterized cyclo-static dataflow) -- two powerful models that allow dynamic reconfigurability and phased behavior in DSP systems. (2) Design Space Exploration: The broad range of target platforms along with the complexity of applications provides a vast design space, calling for efficient tools to explore this space and produce effective design choices. A novel architectural level design methodology based on a formalism called multirate synchronization graphs is presented along with methods for performance evaluation. (3) Multiprocessor Communication Interface: Efficient code synthesis for emerg¬ing new parallel architectures is an important and sparsely-explored problem. A widely-encountered problem in this regard is efficient communication between pro¬cessors running different sub-systems. A widely used tool in the domain of general-purpose multiprocessor clusters is MPI (Message Passing Interface). However, this does not scale well for embedded DSP systems. A new, powerful and highly optimized communication interface for multiprocessor signal processing systems is presented in this work that is based on the integration of relevant properties of MPI with dataflow semantics. (4) Parameterized Design Framework for Particle Filters: Particle filter systems constitute an important class of applications used in a wide number of fields. An effi¬cient design and implementation framework for such systems has been implemented based on the observation that a large number of such applications exhibit similar prop¬erties. The key properties of such applications are identified and parameterized appro¬priately to realize different systems that represent useful trade-off points in the space of possible implementations

    Temporal Analysis of Static Priority Preemptive Scheduled Cyclic Streaming Applications using CSDF Models

    Get PDF
    Real-time streaming applications with cyclic data dependencies that are executed on multiprocessor systems with processor sharing usually require a temporal analysis to give guarantees on their temporal behavior at design time. Current accurate analysis techniques for cyclic applications that are scheduled with Static Priority Preemptive (SPP) schedulers are however limited to the analysis of applications that can be expressed with Homogeneous Synchronous Dataflow (HSDF) models, i.e. in which all tasks operate at a single rate. Moreover, it is required that both input and output buffers synchronize atomically at the beginnings and finishes of task executions, which is difficult to realize on many existing hardware platforms.\ud \ud This paper presents a temporal analysis approach for cyclic real-time streaming applications executed on multiprocessor systems with processor sharing and SPP scheduling that can be expressed using Cyclo-Static Dataflow (CSDF) models. This allows to model tasks with multiple phases and changing rates and furthermore resolves the problematic restriction that buffer synchronization must occur atomically at the boundaries of task executions. For that purpose a joint interference characterization over multiple phases is introduced, which realizes a significant accuracy improvement compared to an isolated consideration of interference.\ud \ud Applicability, efficiency and accuracy of the presented approach are evaluated in a case study using a WLAN 802.11p transceiver application. Thereby different use-cases of CSDF modeling are discussed, including a CSDF model relaxing the requirement of atomic synchronization

    Language and compiler support for stream programs

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 153-166).Stream programs represent an important class of high-performance computations. Defined by their regular processing of sequences of data, stream programs appear most commonly in the context of audio, video, and digital signal processing, though also in networking, encryption, and other areas. Stream programs can be naturally represented as a graph of independent actors that communicate explicitly over data channels. In this work we focus on programs where the input and output rates of actors are known at compile time, enabling aggressive transformations by the compiler; this model is known as synchronous dataflow. We develop a new programming language, StreamIt, that empowers both programmers and compiler writers to leverage the unique properties of the streaming domain. StreamIt offers several new abstractions, including hierarchical single-input single-output streams, composable primitives for data reordering, and a mechanism called teleport messaging that enables precise event handling in a distributed environment. We demonstrate the feasibility of developing applications in StreamIt via a detailed characterization of our 34,000-line benchmark suite, which spans from MPEG-2 encoding/decoding to GMTI radar processing. We also present a novel dynamic analysis for migrating legacy C programs into a streaming representation. The central premise of stream programming is that it enables the compiler to perform powerful optimizations. We support this premise by presenting a suite of new transformations. We describe the first translation of stream programs into the compressed domain, enabling programs written for uncompressed data formats to automatically operate directly on compressed data formats (based on LZ77). This technique offers a median speedup of 15x on common video editing operations.(cont.) We also review other optimizations developed in the StreamIt group, including automatic parallelization (offering an 11x mean speedup on the 16-core Raw machine), optimization of linear computations (offering a 5.5x average speedup on a Pentium 4), and cache-aware scheduling (offering a 3.5x mean speedup on a StrongARM 1100). While these transformations are beyond the reach of compilers for traditional languages such as C, they become tractable given the abundant parallelism and regular communication patterns exposed by the stream programming model.by William Thies.Ph.D

    Compiler techniques for scalable performance of stream programs on multicore architectures

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 211-222).Given the ubiquity of multicore processors, there is an acute need to enable the development of scalable parallel applications without unduly burdening programmers. Currently, programmers are asked not only to explicitly expose parallelism but also concern themselves with issues of granularity, load-balancing, synchronization, and communication. This thesis demonstrates that when algorithmic parallelism is expressed in the form of a stream program, a compiler can effectively and automatically manage the parallelism. Our compiler assumes responsibility for low-level architectural details, transforming implicit algorithmic parallelism into a mapping that achieves scalable parallel performance for a given multicore target. Stream programming is characterized by regular processing of sequences of data, and it is a natural expression of algorithms in the areas of audio, video, digital signal processing, networking, and encryption. Streaming computation is represented as a graph of independent computation nodes that communicate explicitly over data channels. Our techniques operate on contiguous regions of the stream graph where the input and output rates of the nodes are statically determinable. Within a static region, the compiler first automatically adjusts the granularity and then exploits data, task, and pipeline parallelism in a holistic fashion. We introduce techniques that data-parallelize nodes that operate on overlapping sliding windows of their input, translating serializing state into minimal and parametrized inter-core communication. Finally, for nodes that cannot be data-parallelized due to state, we are the first to automatically apply software-pipelining techniques at a coarse granularity to exploit pipeline parallelism between stateful nodes. Our framework is evaluated in the context of the StreamIt programming language. StreamIt is a high-level stream programming language that has been shown to improve programmer productivity in implementing streaming algorithms. We employ the StreamIt Core benchmark suite of 12 real-world applications to demonstrate the effectiveness of our techniques for varying multicore architectures. For a 16-core distributed memory multicore, we achieve a 14.9x mean speedup. For benchmarks that include sliding-window computation, our sliding-window data-parallelization techniques are required to enable scalable performance for a 16-core SMP multicore (14x mean speedup) and a 64-core distributed shared memory multicore (52x mean speedup).by Michael I. Gordon.Ph.D

    An Abstraction-Refinement Theory for the Analysis and Design of Concurrent Real-Time Systems

    Get PDF
    Concurrent real-time systems with shared resources belong to the class of safety-critical systems for which it is required to determine both temporally and functionally conservative guarantees. However, the growing complexity of real-time systems makes it more and more challenging to apply standard techniques for their analysis. Especially the presence of both cyclic data dependencies and cyclic resource dependencies makes many related analysis approaches inapplicable. The usage of Static Priority Preemptive (SPP) scheduling further impedes the employment of many "classical" analysis techniques. To address this growing complexity and to be able to give guarantees nevertheless we present an abstraction-refinement theory for real-time systems. We introduce a timed component model that is defined in such a generic way that both real-time system implementations and any kinds of analysis models for such applications can be expressed therein. Thereafter, we devise three different abstraction-refinement theories for the timed component model, exclusion, inclusion and bounding. Exclusion can be used to remove unconsidered corner cases, inclusion allows for the substitution of uncertainty with non-determinism, while bounding permits to replace non-determinism with determinism. The latter enables the creation of efficiently analyzable models that can be used to give temporal or functional guarantees on non-deterministic and non-monotone implementations. We use such abstractions to construct analysis models from concurrent real-time systems with shared resources and SPP scheduling. On these models we apply various analysis techniques, with the goal to increase analysis accuracy. Our first accuracy improvement is achieved by combining the rather coarse state-of-the-art period-and-jitter interference characterization with an explicit consideration of cyclic data dependencies. The interference-limiting effect of such cycles can be exploited even more with an "iterative buffer sizing". Next we replace period-and-jitter with execution intervals, resulting in an even higher accuracy. In our last approach we increase both accuracy and applicability by enabling the support of real-time systems with tasks consisting of multiple phases and operating at different rates. With a modification of this approach we further enable the analysis of applications with multiple shared resources. Finally, we also present the so-called HAPI simulator that is capable of simulating any kinds of concurrent real-time systems with shared resources

    Resynchronization of Cyclo-Static Dataflow graphs

    No full text

    GSI Scientific Report 2006 [GSI Report 2007-1]

    Get PDF
    corecore