379 research outputs found
SCALABLE TECHNIQUES FOR SCHEDULING AND MAPPING DSP APPLICATIONS ONTO EMBEDDED MULTIPROCESSOR PLATFORMS
A variety of multiprocessor architectures has proliferated even for off-the-shelf computing platforms. To make use of these platforms, traditional implementation frameworks focus on implementing Digital Signal Processing (DSP) applications using special platform features to achieve high performance. However, due to the fast evolution of the underlying architectures, solution redevelopment is error prone and re-usability of existing solutions and libraries is limited. In this thesis, we facilitate an efficient migration of DSP systems to multiprocessor platforms while systematically leveraging previous investment in optimized library kernels using dataflow design frameworks. We make these library elements, which are typically tailored to specialized architectures, more amenable to extensive analysis and optimization using an efficient and systematic process. In this thesis we provide techniques to allow such migration through four basic contributions:
1. We propose and develop a framework to explore efficient utilization of Single Instruction Multiple Data (SIMD) cores and accelerators available in heterogeneous multiprocessor platforms consisting of General Purpose Processors (GPPs) and Graphics Processing Units (GPUs). We also propose new scheduling techniques by applying extensive block processing in conjunction with appropriate task mapping and task ordering methods that match efficiently with the underlying architecture. The approach gives the developer the ability to prototype a GPU-accelerated application and explore its design space efficiently and effectively.
2. We introduce the concept of Partial Expansion Graphs (PEGs) as an implementation model and associated class of scheduling strategies. PEGs are designed to help realize DSP systems in terms of forms and granularities of parallelism that are well matched to the given applications and targeted platforms. PEGs also facilitate derivation of both static and dynamic scheduling techniques, depending on the amount of variability in task execution times and other operating conditions. We show how to implement efficient PEG-based scheduling methods using real time operating systems, and to re-use pre-optimized libraries of DSP components within such implementations.
3. We develop new algorithms for scheduling and mapping systems implemented using PEGs. Collectively, these algorithms operate in three steps. First, the amount of data parallelism in the application graph is tuned systematically over many iterations to profit from the available cores in the target platform. Then a mapping algorithm that uses graph analysis is developed to distribute data and task parallel instances over different cores while trying to balance the load of all processing units to make use of pipeline parallelism. Finally, we use a novel technique for performance evaluation by implementing the scheduler and a customizable solution on the programmable platform. This allows accurate fitness functions to be measured and used to drive runtime adaptation of schedules.
4. In addition to providing scheduling techniques for the mentioned applications and platforms, we also show how to integrate the resulting solution in the underlying environment. This is achieved by leveraging existing libraries and applying the GPP-GPU scheduling framework to augment a popular existing Software Defined Radio (SDR) development environment -- GNU Radio -- with a dataflow foundation and a stand-alone GPU-accelerated library. We also show how to realize the PEG model on real time operating system libraries, such as the Texas Instruments DSP/BIOS. A code generator that accepts a manual system designer solution as well as automatically configured solutions is provided to complete the design flow starting from application model to running system
A guide to benchmarking COVIDâ19 performance data
If the COVIDâ19 pandemic has already taught us anything, it is that policymakers, experts and public managers need to be capable of interpreting comparative data on their government's performance in a meaningful way. Simultaneously, they are confronted with different data sources (and measurements) surrounding COVIDâ19 without necessarily having the tools to assess these sources strategically . Due to the speed with which decisions are required and the different data sources, it can be challenging for any policymaker, expert or public manager to make sense of how COVIDâ19 has an impact, especially from a comparative perspective. Starting from the question âHow can we benchmark COVIDâ19 performance data across countries?â, this article presents important indicators, measurements, and their strengths and weaknesses, and concludes with practical recommendations. These include a focus on measurement equivalence, systems thinking, spatial and temporal thinking, multilevel governance and multimethod designs
DeepM&Mnet for hypersonics: Predicting the coupled flow and finite-rate chemistry behind a normal shock using neural-network approximation of operators
In high-speed flow past a normal shock, the fluid temperature rises rapidly
triggering downstream chemical dissociation reactions. The chemical changes
lead to appreciable changes in fluid properties, and these coupled multiphysics
and the resulting multiscale dynamics are challenging to resolve numerically.
Using conventional computational fluid dynamics (CFD) requires excessive
computing cost. Here, we propose a totally new efficient approach, assuming
that some sparse measurements of the state variables are available that can be
seamlessly integrated in the simulation algorithm. We employ a special neural
network for approximating nonlinear operators, the DeepONet, which is used to
predict separately each individual field, given inputs from the rest of the
fields of the coupled multiphysics system. We demonstrate the effectiveness of
DeepONet by predicting five species in the non-equilibrium chemistry downstream
of a normal shock at high Mach numbers as well as the velocity and temperature
fields. We show that upon training, DeepONets can be over five orders of
magnitude faster than the CFD solver employed to generate the training data and
yield good accuracy for unseen Mach numbers within the range of training.
Outside this range, DeepONet can still predict accurately and fast if a few
sparse measurements are available. We then propose a composite supervised
neural network, DeepM&Mnet, that uses multiple pre-trained DeepONets as
building blocks and scattered measurements to infer the set of all seven fields
in the entire domain of interest. Two DeepM&Mnet architectures are tested, and
we demonstrate the accuracy and capacity for efficient data assimilation.
DeepM&Mnet is simple and general: it can be employed to construct complex
multiphysics and multiscale models and assimilate sparse measurements using
pre-trained DeepONets in a "plug-and-play" mode.Comment: 30 pages, 17 figure
Recommended from our members
Molecular diagnosis in recessive pediatric neurogenetic disease can help reduce disease recurrence in families.
BackgroundThe causes for thousands of individually rare recessive diseases have been discovered since the adoption of next generation sequencing (NGS). Following the molecular diagnosis in older children in a family, parents could use this information to opt for fetal genotyping in subsequent pregnancies, which could inform decisions about elective termination of pregnancy. The use of NGS diagnostic sequencing in families has not been demonstrated to yield benefit in subsequent pregnancies to reduce recurrence. Here we evaluated whether genetic diagnosis in older children in families supports reduction in recurrence of recessive neurogenetic disease.MethodsRetrospective study involving families with a child with a recessive pediatric brain disease (rPBD) that underwent NGS-based molecular diagnosis. Prenatal molecular testing was offered to couples in which a molecular diagnosis was made, to help couples seeking to prevent recurrence. With this information, families made decisions about elective termination. Pregnancies that were carried to term were assessed for the health of child and mother, and compared with historic recurrence risk of recessive disease.ResultsBetween 2010 and 2016, 1172 families presented with a child a likely rPBD, 526 families received a molecular diagnosis, 91 families returned to the clinic with 101 subsequent pregnancies, and 84 opted for fetal genotyping. Sixty tested negative for recurrence for the biallelic mutation in the fetus, and all, except for one spontaneous abortion, carried to term, and were unaffected at follow-up. Of 24 that genotyped positive for the biallelic mutation, 16 were electively terminated, and 8 were carried to term and showed features of disease similar to that of the older affected sibling(s). Among the 101 pregnancies, disease recurrence in living offspring deviated from the expected 25% to the observed 12% ([95% CI 0·04 to 0·20], pâ=â0·011).ConclusionsMolecular diagnosis in an older child, coupled with prenatal fetal genotyping in subsequent pregnancies and genetic counselling, allows families to make informed decisions to reduce recessive neurogenetic disease recurrence
Using the DSPCAD Integrative Command-Line Environment: User's Guide for DICE Version 1.1
This document provides instructions on setting up, starting up, and
building DICE and its key companion packages, dicemin and dicelang. This
installation process is based on a general set of conventions, which we
refer to as the DICE organizational conventions, for software packages.
The DICE organizational conventions are specified in this report. These
conventions are applied in DICE, dicemin, and dicelang, and also to
other software packages that are developed in the Maryland DSPCAD
Research Group
Graphics Processing UnitâAccelerated Nonrigid Registration of MR Images to CT Images During CT-Guided Percutaneous Liver Tumor Ablations
Rationale and Objectives: Accuracy and speed are essential for the intraprocedural nonrigid MR-to-CT image registration in the assessment of tumor margins during CT-guided liver tumor ablations. While both accuracy and speed can be improved by limiting the registration to a region of interest (ROI), manual contouring of the ROI prolongs the registration process substantially. To achieve accurate and fast registration without the use of an ROI, we combined a nonrigid registration technique based on volume subdivision with hardware acceleration using a graphical processing unit (GPU). We compared the registration accuracy and processing time of GPU-accelerated volume subdivision-based nonrigid registration technique to the conventional nonrigid B-spline registration technique. Materials and Methods: Fourteen image data sets of preprocedural MR and intraprocedural CT images for percutaneous CT-guided liver tumor ablations were obtained. Each set of images was registered using the GPU-accelerated volume subdivision technique and the B-spline technique. Manual contouring of ROI was used only for the B-spline technique. Registration accuracies (Dice Similarity Coefficient (DSC) and 95% Hausdorff Distance (HD)), and total processing time including contouring of ROIs and computation were compared using a paired Studentâs t-test. Results: Accuracy of the GPU-accelerated registrations and B-spline registrations, respectively were 88.3 ± 3.7% vs 89.3 ± 4.9% (p = 0.41) for DSC and 13.1 ± 5.2 mm vs 11.4 ± 6.3 mm (p = 0.15) for HD. Total processing time of the GPU-accelerated registration and B-spline registration techniques was 88 ± 14 s vs 557 ± 116 s (p < 0.000000002), respectively; there was no significant difference in computation time despite the difference in the complexity of the algorithms (p = 0.71). Conclusion: The GPU-accelerated volume subdivision technique was as accurate as the B-spline technique and required significantly less processing time. The GPU-accelerated volume subdivision technique may enable the implementation of nonrigid registration into routine clinical practice
Recommended from our members
Graphics Processing UnitâAccelerated Nonrigid Registration of MR Images to CT Images During CT-Guided Percutaneous Liver Tumor Ablations
Rationale and Objectives: Accuracy and speed are essential for the intraprocedural nonrigid MR-to-CT image registration in the assessment of tumor margins during CT-guided liver tumor ablations. While both accuracy and speed can be improved by limiting the registration to a region of interest (ROI), manual contouring of the ROI prolongs the registration process substantially. To achieve accurate and fast registration without the use of an ROI, we combined a nonrigid registration technique based on volume subdivision with hardware acceleration using a graphical processing unit (GPU). We compared the registration accuracy and processing time of GPU-accelerated volume subdivision-based nonrigid registration technique to the conventional nonrigid B-spline registration technique. Materials and Methods: Fourteen image data sets of preprocedural MR and intraprocedural CT images for percutaneous CT-guided liver tumor ablations were obtained. Each set of images was registered using the GPU-accelerated volume subdivision technique and the B-spline technique. Manual contouring of ROI was used only for the B-spline technique. Registration accuracies (Dice Similarity Coefficient (DSC) and 95% Hausdorff Distance (HD)), and total processing time including contouring of ROIs and computation were compared using a paired Studentâs t-test. Results: Accuracy of the GPU-accelerated registrations and B-spline registrations, respectively were 88.3 ± 3.7% vs 89.3 ± 4.9% (p = 0.41) for DSC and 13.1 ± 5.2 mm vs 11.4 ± 6.3 mm (p = 0.15) for HD. Total processing time of the GPU-accelerated registration and B-spline registration techniques was 88 ± 14 s vs 557 ± 116 s (p < 0.000000002), respectively; there was no significant difference in computation time despite the difference in the complexity of the algorithms (p = 0.71). Conclusion: The GPU-accelerated volume subdivision technique was as accurate as the B-spline technique and required significantly less processing time. The GPU-accelerated volume subdivision technique may enable the implementation of nonrigid registration into routine clinical practice
- âŠ