379 research outputs found

    SCALABLE TECHNIQUES FOR SCHEDULING AND MAPPING DSP APPLICATIONS ONTO EMBEDDED MULTIPROCESSOR PLATFORMS

    Get PDF
    A variety of multiprocessor architectures has proliferated even for off-the-shelf computing platforms. To make use of these platforms, traditional implementation frameworks focus on implementing Digital Signal Processing (DSP) applications using special platform features to achieve high performance. However, due to the fast evolution of the underlying architectures, solution redevelopment is error prone and re-usability of existing solutions and libraries is limited. In this thesis, we facilitate an efficient migration of DSP systems to multiprocessor platforms while systematically leveraging previous investment in optimized library kernels using dataflow design frameworks. We make these library elements, which are typically tailored to specialized architectures, more amenable to extensive analysis and optimization using an efficient and systematic process. In this thesis we provide techniques to allow such migration through four basic contributions: 1. We propose and develop a framework to explore efficient utilization of Single Instruction Multiple Data (SIMD) cores and accelerators available in heterogeneous multiprocessor platforms consisting of General Purpose Processors (GPPs) and Graphics Processing Units (GPUs). We also propose new scheduling techniques by applying extensive block processing in conjunction with appropriate task mapping and task ordering methods that match efficiently with the underlying architecture. The approach gives the developer the ability to prototype a GPU-accelerated application and explore its design space efficiently and effectively. 2. We introduce the concept of Partial Expansion Graphs (PEGs) as an implementation model and associated class of scheduling strategies. PEGs are designed to help realize DSP systems in terms of forms and granularities of parallelism that are well matched to the given applications and targeted platforms. PEGs also facilitate derivation of both static and dynamic scheduling techniques, depending on the amount of variability in task execution times and other operating conditions. We show how to implement efficient PEG-based scheduling methods using real time operating systems, and to re-use pre-optimized libraries of DSP components within such implementations. 3. We develop new algorithms for scheduling and mapping systems implemented using PEGs. Collectively, these algorithms operate in three steps. First, the amount of data parallelism in the application graph is tuned systematically over many iterations to profit from the available cores in the target platform. Then a mapping algorithm that uses graph analysis is developed to distribute data and task parallel instances over different cores while trying to balance the load of all processing units to make use of pipeline parallelism. Finally, we use a novel technique for performance evaluation by implementing the scheduler and a customizable solution on the programmable platform. This allows accurate fitness functions to be measured and used to drive runtime adaptation of schedules. 4. In addition to providing scheduling techniques for the mentioned applications and platforms, we also show how to integrate the resulting solution in the underlying environment. This is achieved by leveraging existing libraries and applying the GPP-GPU scheduling framework to augment a popular existing Software Defined Radio (SDR) development environment -- GNU Radio -- with a dataflow foundation and a stand-alone GPU-accelerated library. We also show how to realize the PEG model on real time operating system libraries, such as the Texas Instruments DSP/BIOS. A code generator that accepts a manual system designer solution as well as automatically configured solutions is provided to complete the design flow starting from application model to running system

    A guide to benchmarking COVID‐19 performance data

    Get PDF
    If the COVID‐19 pandemic has already taught us anything, it is that policymakers, experts and public managers need to be capable of interpreting comparative data on their government's performance in a meaningful way. Simultaneously, they are confronted with different data sources (and measurements) surrounding COVID‐19 without necessarily having the tools to assess these sources strategically . Due to the speed with which decisions are required and the different data sources, it can be challenging for any policymaker, expert or public manager to make sense of how COVID‐19 has an impact, especially from a comparative perspective. Starting from the question “How can we benchmark COVID‐19 performance data across countries?”, this article presents important indicators, measurements, and their strengths and weaknesses, and concludes with practical recommendations. These include a focus on measurement equivalence, systems thinking, spatial and temporal thinking, multilevel governance and multimethod designs

    DeepM&Mnet for hypersonics: Predicting the coupled flow and finite-rate chemistry behind a normal shock using neural-network approximation of operators

    Full text link
    In high-speed flow past a normal shock, the fluid temperature rises rapidly triggering downstream chemical dissociation reactions. The chemical changes lead to appreciable changes in fluid properties, and these coupled multiphysics and the resulting multiscale dynamics are challenging to resolve numerically. Using conventional computational fluid dynamics (CFD) requires excessive computing cost. Here, we propose a totally new efficient approach, assuming that some sparse measurements of the state variables are available that can be seamlessly integrated in the simulation algorithm. We employ a special neural network for approximating nonlinear operators, the DeepONet, which is used to predict separately each individual field, given inputs from the rest of the fields of the coupled multiphysics system. We demonstrate the effectiveness of DeepONet by predicting five species in the non-equilibrium chemistry downstream of a normal shock at high Mach numbers as well as the velocity and temperature fields. We show that upon training, DeepONets can be over five orders of magnitude faster than the CFD solver employed to generate the training data and yield good accuracy for unseen Mach numbers within the range of training. Outside this range, DeepONet can still predict accurately and fast if a few sparse measurements are available. We then propose a composite supervised neural network, DeepM&Mnet, that uses multiple pre-trained DeepONets as building blocks and scattered measurements to infer the set of all seven fields in the entire domain of interest. Two DeepM&Mnet architectures are tested, and we demonstrate the accuracy and capacity for efficient data assimilation. DeepM&Mnet is simple and general: it can be employed to construct complex multiphysics and multiscale models and assimilate sparse measurements using pre-trained DeepONets in a "plug-and-play" mode.Comment: 30 pages, 17 figure

    Using the DSPCAD Integrative Command-Line Environment: User's Guide for DICE Version 1.1

    Get PDF
    This document provides instructions on setting up, starting up, and building DICE and its key companion packages, dicemin and dicelang. This installation process is based on a general set of conventions, which we refer to as the DICE organizational conventions, for software packages. The DICE organizational conventions are specified in this report. These conventions are applied in DICE, dicemin, and dicelang, and also to other software packages that are developed in the Maryland DSPCAD Research Group

    Graphics Processing Unit–Accelerated Nonrigid Registration of MR Images to CT Images During CT-Guided Percutaneous Liver Tumor Ablations

    Get PDF
    Rationale and Objectives: Accuracy and speed are essential for the intraprocedural nonrigid MR-to-CT image registration in the assessment of tumor margins during CT-guided liver tumor ablations. While both accuracy and speed can be improved by limiting the registration to a region of interest (ROI), manual contouring of the ROI prolongs the registration process substantially. To achieve accurate and fast registration without the use of an ROI, we combined a nonrigid registration technique based on volume subdivision with hardware acceleration using a graphical processing unit (GPU). We compared the registration accuracy and processing time of GPU-accelerated volume subdivision-based nonrigid registration technique to the conventional nonrigid B-spline registration technique. Materials and Methods: Fourteen image data sets of preprocedural MR and intraprocedural CT images for percutaneous CT-guided liver tumor ablations were obtained. Each set of images was registered using the GPU-accelerated volume subdivision technique and the B-spline technique. Manual contouring of ROI was used only for the B-spline technique. Registration accuracies (Dice Similarity Coefficient (DSC) and 95% Hausdorff Distance (HD)), and total processing time including contouring of ROIs and computation were compared using a paired Student’s t-test. Results: Accuracy of the GPU-accelerated registrations and B-spline registrations, respectively were 88.3 ± 3.7% vs 89.3 ± 4.9% (p = 0.41) for DSC and 13.1 ± 5.2 mm vs 11.4 ± 6.3 mm (p = 0.15) for HD. Total processing time of the GPU-accelerated registration and B-spline registration techniques was 88 ± 14 s vs 557 ± 116 s (p < 0.000000002), respectively; there was no significant difference in computation time despite the difference in the complexity of the algorithms (p = 0.71). Conclusion: The GPU-accelerated volume subdivision technique was as accurate as the B-spline technique and required significantly less processing time. The GPU-accelerated volume subdivision technique may enable the implementation of nonrigid registration into routine clinical practice
    • 

    corecore