71 research outputs found

    SCALABLE TECHNIQUES FOR SCHEDULING AND MAPPING DSP APPLICATIONS ONTO EMBEDDED MULTIPROCESSOR PLATFORMS

    Get PDF
    A variety of multiprocessor architectures has proliferated even for off-the-shelf computing platforms. To make use of these platforms, traditional implementation frameworks focus on implementing Digital Signal Processing (DSP) applications using special platform features to achieve high performance. However, due to the fast evolution of the underlying architectures, solution redevelopment is error prone and re-usability of existing solutions and libraries is limited. In this thesis, we facilitate an efficient migration of DSP systems to multiprocessor platforms while systematically leveraging previous investment in optimized library kernels using dataflow design frameworks. We make these library elements, which are typically tailored to specialized architectures, more amenable to extensive analysis and optimization using an efficient and systematic process. In this thesis we provide techniques to allow such migration through four basic contributions: 1. We propose and develop a framework to explore efficient utilization of Single Instruction Multiple Data (SIMD) cores and accelerators available in heterogeneous multiprocessor platforms consisting of General Purpose Processors (GPPs) and Graphics Processing Units (GPUs). We also propose new scheduling techniques by applying extensive block processing in conjunction with appropriate task mapping and task ordering methods that match efficiently with the underlying architecture. The approach gives the developer the ability to prototype a GPU-accelerated application and explore its design space efficiently and effectively. 2. We introduce the concept of Partial Expansion Graphs (PEGs) as an implementation model and associated class of scheduling strategies. PEGs are designed to help realize DSP systems in terms of forms and granularities of parallelism that are well matched to the given applications and targeted platforms. PEGs also facilitate derivation of both static and dynamic scheduling techniques, depending on the amount of variability in task execution times and other operating conditions. We show how to implement efficient PEG-based scheduling methods using real time operating systems, and to re-use pre-optimized libraries of DSP components within such implementations. 3. We develop new algorithms for scheduling and mapping systems implemented using PEGs. Collectively, these algorithms operate in three steps. First, the amount of data parallelism in the application graph is tuned systematically over many iterations to profit from the available cores in the target platform. Then a mapping algorithm that uses graph analysis is developed to distribute data and task parallel instances over different cores while trying to balance the load of all processing units to make use of pipeline parallelism. Finally, we use a novel technique for performance evaluation by implementing the scheduler and a customizable solution on the programmable platform. This allows accurate fitness functions to be measured and used to drive runtime adaptation of schedules. 4. In addition to providing scheduling techniques for the mentioned applications and platforms, we also show how to integrate the resulting solution in the underlying environment. This is achieved by leveraging existing libraries and applying the GPP-GPU scheduling framework to augment a popular existing Software Defined Radio (SDR) development environment -- GNU Radio -- with a dataflow foundation and a stand-alone GPU-accelerated library. We also show how to realize the PEG model on real time operating system libraries, such as the Texas Instruments DSP/BIOS. A code generator that accepts a manual system designer solution as well as automatically configured solutions is provided to complete the design flow starting from application model to running system

    Exploiting parallelism within multidimensional multirate digital signal processing systems

    Get PDF
    The intense requirements for high processing rates of multidimensional Digital Signal Processing systems in practical applications justify the Application Specific Integrated Circuits designs and parallel processing implementations. In this dissertation, we propose novel theories, methodologies and architectures in designing high-performance VLSI implementations for general multidimensional multirate Digital Signal Processing systems by exploiting the parallelism within those applications. To systematically exploit the parallelism within the multidimensional multirate DSP algorithms, we develop novel transformations including (1) nonlinear I/O data space transforms, (2) intercalation transforms, and (3) multidimensional multirate unfolding transforms. These transformations are applied to the algorithms leading to systematic methodologies in high-performance architectural designs. With the novel design methodologies, we develop several architectures with parallel and distributed processing features for implementing multidimensional multirate applications. Experimental results have shown that those architectures are much more efficient in terms of execution time and/or hardware cost compared with existing hardware implementations

    Algorithms and architectures for the multirate additive synthesis of musical tones

    Get PDF
    In classical Additive Synthesis (AS), the output signal is the sum of a large number of independently controllable sinusoidal partials. The advantages of AS for music synthesis are well known as is the high computational cost. This thesis is concerned with the computational optimisation of AS by multirate DSP techniques. In note-based music synthesis, the expected bounds of the frequency trajectory of each partial in a finite lifecycle tone determine critical time-invariant partial-specific sample rates which are lower than the conventional rate (in excess of 40kHz) resulting in computational savings. Scheduling and interpolation (to suppress quantisation noise) for many sample rates is required, leading to the concept of Multirate Additive Synthesis (MAS) where these overheads are minimised by synthesis filterbanks which quantise the set of available sample rates. Alternative AS optimisations are also appraised. It is shown that a hierarchical interpretation of the QMF filterbank preserves AS generality and permits efficient context-specific adaptation of computation to required note dynamics. Practical QMF implementation and the modifications necessary for MAS are discussed. QMF transition widths can be logically excluded from the MAS paradigm, at a cost. Therefore a novel filterbank is evaluated where transition widths are physically excluded. Benchmarking of a hypothetical orchestral synthesis application provides a tentative quantitative analysis of the performance improvement of MAS over AS. The mapping of MAS into VLSI is opened by a review of sine computation techniques. Then the functional specification and high-level design of a conceptual MAS Coprocessor (MASC) is developed which functions with high autonomy in a loosely-coupled master- slave configuration with a Host CPU which executes filterbanks in software. Standard hardware optimisation techniques are used, such as pipelining, based upon the principle of an application-specific memory hierarchy which maximises MASC throughput

    Exploiting parallelism within multidimensional multirate digital signal processing systems

    Get PDF
    The intense requirements for high processing rates of multidimensional Digital Signal Processing systems in practical applications justify the Application Specific Integrated Circuits designs and parallel processing implementations. In this dissertation, we propose novel theories, methodologies and architectures in designing high-performance VLSI implementations for general multidimensional multirate Digital Signal Processing systems by exploiting the parallelism within those applications. To systematically exploit the parallelism within the multidimensional multirate DSP algorithms, we develop novel transformations including (1) nonlinear I/O data space transforms, (2) intercalation transforms, and (3) multidimensional multirate unfolding transforms. These transformations are applied to the algorithms leading to systematic methodologies in high-performance architectural designs. With the novel design methodologies, we develop several architectures with parallel and distributed processing features for implementing multidimensional multirate applications. Experimental results have shown that those architectures are much more efficient in terms of execution time and/or hardware cost compared with existing hardware implementations

    Efficient on-chip communications for data-flow IPs

    Get PDF
    International audienceWe explain a systematic way of interfacing data-flow hardware accelerators (IP) for their integration in a system on chip. We abstract the communication behaviour of the data flow IP so as to provide basis for an interface generator. We also explain which parameter this interface generator has to take into account. We validate our interface mechanism by a cycle accurate bit accurate simulation of a SoC integrating a data-flow ip

    Modeling and Mapping of Optimized Schedules for Embedded Signal Processing Systems

    Get PDF
    The demand for Digital Signal Processing (DSP) in embedded systems has been increasing rapidly due to the proliferation of multimedia- and communication-intensive devices such as pervasive tablets and smart phones. Efficient implementation of embedded DSP systems requires integration of diverse hardware and software components, as well as dynamic workload distribution across heterogeneous computational resources. The former implies increased complexity of application modeling and analysis, but also brings enhanced potential for achieving improved energy consumption, cost or performance. The latter results from the increased use of dynamic behavior in embedded DSP applications. Furthermore, parallel programming is highly relevant in many embedded DSP areas due to the development and use of Multiprocessor System-On-Chip (MPSoC) technology. The need for efficient cooperation among different devices supporting diverse parallel embedded computations motivates high-level modeling that expresses dynamic signal processing behaviors and supports efficient task scheduling and hardware mapping. Starting with dynamic modeling, this thesis develops a systematic design methodology that supports functional simulation and hardware mapping of dynamic reconfiguration based on Parameterized Synchronous Dataflow (PSDF) graphs. By building on the DIF (Dataflow Interchange Format), which is a design language and associated software package for developing and experimenting with dataflow-based design techniques for signal processing systems, we have developed a novel tool for functional simulation of PSDF specifications. This simulation tool allows designers to model applications in PSDF and simulate their functionality, including use of the dynamic parameter reconfiguration capabilities offered by PSDF. With the help of this simulation tool, our design methodology helps to map PSDF specifications into efficient implementations on field programmable gate arrays (FPGAs). Furthermore, valid schedules can be derived from the PSDF models at runtime to adapt hardware configurations based on changing data characteristics or operational requirements. Under certain conditions, efficient quasi-static schedules can be applied to reduce overhead and enhance predictability in the scheduling process. Motivated by the fact that scheduling is critical to performance and to efficient use of dynamic reconfiguration, we have focused on a methodology for schedule design, which complements the emphasis on automated schedule construction in the existing literature on dataflow-based design and implementation. In particular, we have proposed a dataflow-based schedule design framework called the dataflow schedule graph (DSG), which provides a graphical framework for schedule construction based on dataflow semantics, and can also be used as an intermediate representation target for automated schedule generation. Our approach to applying the DSG in this thesis emphasizes schedule construction as a design process rather than an outcome of the synthesis process. Our approach employs dataflow graphs for representing both application models and schedules that are derived from them. By providing a dataflow-integrated framework for unambiguously representing, analyzing, manipulating, and interchanging schedules, the DSG facilitates effective codesign of dataflow-based application models and schedules for execution of these models. As multicore processors are deployed in an increasing variety of embedded image processing systems, effective utilization of resources such as multiprocessor systemon-chip (MPSoC) devices, and effective handling of implementation concerns such as memory management and I/O become critical to developing efficient embedded implementations. However, the diversity and complexity of applications and architectures in embedded image processing systems make the mapping of applications onto MPSoCs difficult. We help to address this challenge through a structured design methodology that is built upon the DSG modeling framework. We refer to this methodology as the DEIPS methodology (DSG-based design and implementation of Embedded Image Processing Systems). The DEIPS methodology provides a unified framework for joint consideration of DSG structures and the application graphs from which they are derived, which allows designers to integrate considerations of parallelization and resource constraints together with the application modeling process. We demonstrate the DEIPS methodology through cases studies on practical embedded image processing systems
    • …
    corecore