68 research outputs found

    SCALABLE TECHNIQUES FOR SCHEDULING AND MAPPING DSP APPLICATIONS ONTO EMBEDDED MULTIPROCESSOR PLATFORMS

    Get PDF
    A variety of multiprocessor architectures has proliferated even for off-the-shelf computing platforms. To make use of these platforms, traditional implementation frameworks focus on implementing Digital Signal Processing (DSP) applications using special platform features to achieve high performance. However, due to the fast evolution of the underlying architectures, solution redevelopment is error prone and re-usability of existing solutions and libraries is limited. In this thesis, we facilitate an efficient migration of DSP systems to multiprocessor platforms while systematically leveraging previous investment in optimized library kernels using dataflow design frameworks. We make these library elements, which are typically tailored to specialized architectures, more amenable to extensive analysis and optimization using an efficient and systematic process. In this thesis we provide techniques to allow such migration through four basic contributions: 1. We propose and develop a framework to explore efficient utilization of Single Instruction Multiple Data (SIMD) cores and accelerators available in heterogeneous multiprocessor platforms consisting of General Purpose Processors (GPPs) and Graphics Processing Units (GPUs). We also propose new scheduling techniques by applying extensive block processing in conjunction with appropriate task mapping and task ordering methods that match efficiently with the underlying architecture. The approach gives the developer the ability to prototype a GPU-accelerated application and explore its design space efficiently and effectively. 2. We introduce the concept of Partial Expansion Graphs (PEGs) as an implementation model and associated class of scheduling strategies. PEGs are designed to help realize DSP systems in terms of forms and granularities of parallelism that are well matched to the given applications and targeted platforms. PEGs also facilitate derivation of both static and dynamic scheduling techniques, depending on the amount of variability in task execution times and other operating conditions. We show how to implement efficient PEG-based scheduling methods using real time operating systems, and to re-use pre-optimized libraries of DSP components within such implementations. 3. We develop new algorithms for scheduling and mapping systems implemented using PEGs. Collectively, these algorithms operate in three steps. First, the amount of data parallelism in the application graph is tuned systematically over many iterations to profit from the available cores in the target platform. Then a mapping algorithm that uses graph analysis is developed to distribute data and task parallel instances over different cores while trying to balance the load of all processing units to make use of pipeline parallelism. Finally, we use a novel technique for performance evaluation by implementing the scheduler and a customizable solution on the programmable platform. This allows accurate fitness functions to be measured and used to drive runtime adaptation of schedules. 4. In addition to providing scheduling techniques for the mentioned applications and platforms, we also show how to integrate the resulting solution in the underlying environment. This is achieved by leveraging existing libraries and applying the GPP-GPU scheduling framework to augment a popular existing Software Defined Radio (SDR) development environment -- GNU Radio -- with a dataflow foundation and a stand-alone GPU-accelerated library. We also show how to realize the PEG model on real time operating system libraries, such as the Texas Instruments DSP/BIOS. A code generator that accepts a manual system designer solution as well as automatically configured solutions is provided to complete the design flow starting from application model to running system

    Dataflow-based Design and Implementation of Image Processing Applications

    Get PDF
    Dataflow is a well known computational model and is widely used for expressing the functionality of digital signal processing (DSP) applications, such as audio and video data stream processing, digital communications, and image processing. These applications usually require real-time processing capabilities and have critical performance constraints. Dataflow provides a formal mechanism for describing specifications of DSP applications, imposes minimal data-dependency constraints in specifications, and is effective in exposing and exploiting task or data level parallelism for achieving high performance implementations. To demonstrate dataflow-based design methods in a manner that is concrete and easily adapted to different platforms and back-end design tools, we present in this report a number of case studies based on the lightweight dataflow (LWDF) programming methodology. LWDF is designed as a "minimalistic" approach for integrating coarse grain dataflow programming structures into arbitrary simulation- or platform-oriented languages, such as C, C++, CUDA, MATLAB, SystemC, Verilog, and VHDL. In particular, LWDF requires minimal dependence on specialized tools or libraries. This feature --- together with the rigorous adherence to dataflow principles throughout the LWDF design framework --- allows designers to integrate and experiment with dataflow modeling approaches relatively quickly and flexibly into existing design methodologies and processes

    Modeling and Software Synthesis for Multiprocessor Implementation of Wireless Communication Systems

    Get PDF
    In recent years, the complexity of designing embedded signal processing systems for wireless communications has increased significantly based on the need to support increasing levels of operational flexibility and adaptivity, while also supporting increasing data rates and bandwidths. These trends pose important design and implementation challenges to meet the required demands on communication system performance, real-time operation, energy efficiency, and reconfigurability. Dataflow models of computation provide a useful framework that can be built upon to address these challenges. Dataflow models provide high-level abstractions for specifying, analyzing and implementing a wide range of embedded signal processing applications. They allow designers to specify an application using high-level, platform-independent representations, and synthesize optimized embedded software that is targeted to specific types of hardware resources and design constraints. The growing complexity of wireless communication systems, as motivated above, along with the complexity of system-on-chip platforms for embedded signal processing result in new problems that must be addressed in developing effective dataflow-based design methodologies. First, significant improvements to dataflow-based models and methods are needed to effectively utilize heterogeneous computing platforms and multiple forms of parallelism under stringent constraints on real-time performance and energy consumption. Second, effective modeling and analysis methods for handling dynamic parameters within dataflow graph components are needed for reliable and efficient management of system-level adaptivity and reconfiguration. In this thesis, we address these problems by developing an integrated framework that exploits pipeline, data and task-level parallelism in dataflow models under memory constraints, and proposing novel dataflow modeling concepts and performance optimization techniques for design and implementation of dynamically parameterized communication systems. The main contributions of the thesis are summarized as follows: (1) Software synthesis framework for heterogeneous signal processing platforms. We have developed an integrated dataflow-based design framework called DIF-GPU, which provides a toolset for specification, optimization and software synthesis of embedded software targeted to heterogeneous CPU-GPU platforms. DIF-GPU incorporates novel models and methods in the dataflow interchange format (DIF) that are geared toward design optimization of signal processing systems on heterogeneous architectures composed of multicore CPUs and GPUs. DIF-GPU helps to free developers from low-level, platform-specific fine-tuning, and allows them to focus on higher-level aspects of communication system design. (2) Vectorization in DIF-GPU. In the context of dataflow models for embedded signal processing, vectorization is an important transformation for exploiting data parallelism. We have developed new techniques for integrated dataflow graph vectorization and scheduling on heterogeneous platforms. These techniques are developed in the DIF-GPU framework to provide optimized vectorization and scheduling capabilities for hybrid CPU-GPU platforms under memory constraints. For the targeted class of platforms, these techniques are shown to provide significantly better processing throughput compared to previous methods for a given memory constraint. We demonstrate our integrated vectorization and scheduling techniques by applying them to an Orthogonal Frequency Division Multiplexing (OFDM) receiver system. (3) Modeling parameterized, dynamic dataflow behavior. We introduce a novel modeling method, called parameterized sets of modes (PSMs), that enables efficient representation and analysis of adaptive and dynamically reconfigurable signal processing functionality. PSMs can be viewed as high-level abstractions that model parameterized functionality involving groups of related regimes of operation ("modes") for dynamic dataflow models. We develop formal foundations for PSM-based modeling, and demonstrate the utility of this form of modeling by using it to develop efficient methods for scheduling dynamically parameterized dataflow graphs on different types of relevant platforms

    Reconfigurable Video Coding on multicore : an overview of its main objectives

    Get PDF
    International audienceThe current monolithic and lengthy scheme behind the standardization and the design of new video coding standards is becoming inappropriate to satisfy the dynamism and changing needs of the video coding community. Such scheme and specification formalism does not allow the clear commonalities between the different codecs to be shown, at the level of the specification nor at the level of the implementation. Such a problem is one of the main reasons for the typically long interval elapsing between the time a new idea is validated until it is implemented in consumer products as part of a worldwide standard. The analysis of this problem originated a new standard initiative within the International Organization for Standardization (ISO)/ International Electrotechnical Commission (IEC) Moving Pictures Experts Group (MPEG) committee, namely Reconfigurable Video Coding (RVC). The main idea is to develop a video coding standard that overcomes many shortcomings of the current standardization and specification process by updating and progressively incrementing a modular library of components. As the name implies, flexibility and reconfigurability are new attractive features of the RVC standard. Besides allowing for the definition of new codec algorithms, such features, as well as the dataflow-based specification formalism, open the way to define video coding standards that expressly target implementations on platforms with multiple cores. This article provides an overview of the main objectives of the new RVC standard, with an emphasis on the features that enable efficient implementation on platforms with multiple cores. A brief introduction to the methodologies that efficiently map RVC codec specifications to multicore platforms is accompanied with an example of the possible breakthroughs that are expected to occur in the design and deployment of multimedia services on multicore platforms

    MULTI-SCALE SCHEDULING TECHNIQUES FOR SIGNAL PROCESSING SYSTEMS

    Get PDF
    A variety of hardware platforms for signal processing has emerged, from distributed systems such as Wireless Sensor Networks (WSNs) to parallel systems such as Multicore Programmable Digital Signal Processors (PDSPs), Multicore General Purpose Processors (GPPs), and Graphics Processing Units (GPUs) to heterogeneous combinations of parallel and distributed devices. When a signal processing application is implemented on one of those platforms, the performance critically depends on the scheduling techniques, which in general allocate computation and communication resources for competing processing tasks in the application to optimize performance metrics such as power consumption, throughput, latency, and accuracy. Signal processing systems implemented on such platforms typically involve multiple levels of processing and communication hierarchy, such as network-level, chip-level, and processor-level in a structural context, and application-level, subsystem-level, component-level, and operation- or instruction-level in a behavioral context. In this thesis, we target scheduling issues that carefully address and integrate scheduling considerations at different levels of these structural and behavioral hierarchies. The core contributions of the thesis include the following. Considering both the network-level and chip-level, we have proposed an adaptive scheduling algorithm for wireless sensor networks (WSNs) designed for event detection. Our algorithm exploits discrepancies among the detection accuracy of individual sensors, which are derived from a collaborative training process, to allow each sensor to operate in a more energy efficient manner while the network satisfies given constraints on overall detection accuracy. Considering the chip-level and processor-level, we incorporated both temperature and process variations to develop new scheduling methods for throughput maximization on multicore processors. In particular, we studied how to process a large number of threads with high speed and without violating a given maximum temperature constraint. We targeted our methods to multicore processors in which the cores may operate at different frequencies and different levels of leakage. We develop speed selection and thread assignment schedulers based on the notion of a core's steady state temperature. Considering the application-level, component-level and operation-level, we developed a new dataflow based design flow within the targeted dataflow interchange format (TDIF) design tool. Our new multiprocessor system-on-chip (MPSoC)-oriented design flow, called TDIF-PPG, is geared towards analysis and mapping of embedded DSP applications on MPSoCs. An important feature of TDIF-PPG is its capability to integrate graph level parallelism and actor level parallelism into the application mapping process. Here, graph level parallelism is exposed by the dataflow graph application representation in TDIF, and actor level parallelism is modeled by a novel model for multiprocessor dataflow graph implementation that we call the Parallel Processing Group (PPG) model. Building on the contribution above, we formulated a new type of parallel task scheduling problem called Parallel Actor Scheduling (PAS) for chip-level MPSoC mapping of DSP systems that are represented as synchronous dataflow (SDF) graphs. In contrast to traditional SDF-based scheduling techniques, which focus on exploiting graph level (inter-actor) parallelism, the PAS problem targets the integrated exploitation of both intra- and inter-actor parallelism for platforms in which individual actors can be parallelized across multiple processing units. We address a special case of the PAS problem in which all of the actors in the DSP application or subsystem being optimized can be parallelized. For this special case, we develop and experimentally evaluate a two-phase scheduling framework with three work flows --- particle swarm optimization with a mixed integer programming formulation, particle swarm optimization with a simulated annealing engine, and particle swarm optimization with a fast heuristic based on list scheduling. Then, we extend our scheduling framework to support general PAS problem which considers the actors cannot be parallelized

    Integrated Software Synthesis for Signal Processing Applications

    Get PDF
    Signal processing applications usually encounter multi-dimensional real-time performance requirements and restrictions on resources, which makes software implementation complex. Although major advances have been made in embedded processor technology for this application domain -- in particular, in technology for programmable digital signal processors -- traditional compiler techniques applied to such platforms do not generate machine code of desired quality. As a result, low-level, human-driven fine tuning of software implementations is needed, and we are therefore in need of more effective strategies for software implementation for signal processing applications. In this thesis, a number of important memory and performance optimization problems are addressed for translating high-level representations of signal processing applications into embedded software implementations. This investigation centers around signal processing-oriented dataflow models of computation. This form of dataflow provides a coarse grained modeling approach that is well-suited to the signal processing domain and is increasingly supported by commercial and research-oriented tools for design and implementation of signal processing systems. Well-developed dataflow models of signal processing systems expose high-level application structure that can be used by designers and design tools to guide optimization of hardware and software implementations. This thesis advances the suite of techniques available for optimization of software implementations that are derived from the application structure exposed from dataflow representations. In addition, the specialized architecture of programmable digital signal processors is considered jointly with dataflow-based analysis to streamline the optimization process for this important family of embedded processors. The specialized features of programmable digital signal processors that are addressed in this thesis include parallel memory banks to facilitate data parallelism, and signal-processing-oriented addressing modes and address register management capabilities. The problems addressed in this thesis involve several inter-related features, and therefore an integrated approach is required to solve them effectively. This thesis proposes such an integrated approach, and develops the approach through formal problem formulations, in-depth theoretical analysis, and extensive experimentation

    A Framework for Fixed Priority Periodic Scheduling Synthesis from Synchronous Data-flow Graphs

    Get PDF
    International audienceUnlabelled - Medicinally active compounds in the flavonoid class of phytochemicals are being studied for antiviral action against various DNA and RNA viruses. Quercetin is a flavonoid present in a wide range of foods, including fruits and vegetables. It is said to be efficient against a wide range of viruses. This research investigated the usefulness of Quercetin against Hepatitis C virus, Dengue type 2 virus, Ebola virus, and Influenza A using computational models. A molecular docking study using the online tool PockDrug was accomplished to identify the best binding sites between Quercetin and PubChem-based receptors. Network-pharmacological assay to opt to verify function-specific gene-compound interactions using STITCH, STRING, GSEA, Cytoscape plugin cytoHubba. Quercetin explored tremendous binding affinity against NS5A protein for HCV with a docking score of - 6.268 kcal/mol, NS5 for DENV-2 with a docking score of - 5.393 kcal/mol, VP35 protein for EBOV with a docking score of - 4.524 kcal/mol, and NP protein for IAV with a docking score of - 6.954 kcal/mol. In the network-pharmacology study, out of 39 hub genes, 38 genes have been found to interact with Quercetin and the top interconnected nodes in the protein-protein network were (based on the degree of interaction with other nodes) and Negative binding energies were noticed in Quercetin-receptor interaction. Results demonstrate that Quercetin could be a potential antiviral agent against these viral diseases with further study in models. Supplementary information - The online version contains supplementary material available at 10.1007/s40203-022-00132-2

    Shared Memory Implementations of Synchronous Dataflow Specifications Using Lifetime Analysis Techniques

    Get PDF
    There has been a proliferation of block-diagram environments for specifying and prototyping DSP systems. These include tools from academia like Ptolemy [6], and commercial tools like SPW from Cadence Design Systems, and Cossap from Synopsys. The block diagram languages used in these environments are usually based on dataflow semantics because various subsets of dataflow have proven to be good matches for expressing and modeling signal processing systems. In particular, synchronous dataflow (SDF)[14] has been found to be a particularly good match for expressing multirate signal processing systems [5]. One of the key problems that arises during synthesis from an SDF specification is scheduling. Past work on scheduling [3] from SDF has focused on optimization of program memory and buffer memory. However, in [3], no attempt was made for overlaying or sharing buffers. In this paper, we formally tackle the problem of generating optimally compact schedules for SDF graphs, that also attempt to minimize buffering mem- ory under the assumption that buffers will be shared. This will result in schedules whose data memory usage is drastically lower than methods in the past have achieved. The method we use is that of lifetime analysis; we develop a model for buffer lifetimes in SDF graphs, and develop scheduling algorithms that attempt to generate schedules that minimize the maximum number of live tokens under the particular buffer lifetime model. We develop several efficient algorithms for extracting the relevant lifetimes from the SDF schedule. We then use the firstfit heuristic for packing arrays efficiently into memory. We report extensive experimental results on applying these techniques to several practical SDF systems, and show improvements that average 50% over previous techniques, with some systems exhibiting upto an 83% improvement over previous techniques. Also cross-referenced as UMIACS-TR-99-3
    • …
    corecore