1,307 research outputs found
Evaluation of the parallel computational capabilities of embedded platforms for critical systems
Modern critical systems need higher performance which cannot be delivered by the simple architectures used so far. Latest embedded architectures feature multi-cores and GPUs, which can be used to satisfy this need. In this thesis we parallelise relevant applications from multiple critical domains represented in the GPU4S benchmark suite, and perform a comparison of the parallel capabilities of candidate platforms for use in critical systems. In particular, we port the open source GPU4S Bench benchmarking suite in the OpenMP programming model, and we benchmark the candidate embedded heterogeneous multi-core platforms of the H2020 UP2DATE project, NVIDIA TX2, NVIDIA Xavier and Xilinx Zynq Ultrascale+, in order to drive the selection of the research platform which will be used in the next phases of the project. Our result indicate that in terms of CPU and GPU performance, the NVIDIA Xavier is the highest performing platform
Toward optimised skeletons for heterogeneous parallel architecture with performance cost model
High performance architectures are increasingly heterogeneous with shared and
distributed memory components, and accelerators like GPUs. Programming such
architectures is complicated and performance portability is a major issue as the
architectures evolve. This thesis explores the potential for algorithmic skeletons
integrating a dynamically parametrised static cost model, to deliver portable
performance for mostly regular data parallel programs on heterogeneous archi-
tectures.
The rst contribution of this thesis is to address the challenges of program-
ming heterogeneous architectures by providing two skeleton-based programming
libraries: i.e. HWSkel for heterogeneous multicore clusters and GPU-HWSkel
that enables GPUs to be exploited as general purpose multi-processor devices.
Both libraries provide heterogeneous data parallel algorithmic skeletons including
hMap, hMapAll, hReduce, hMapReduce, and hMapReduceAll.
The second contribution is the development of cost models for workload dis-
tribution. First, we construct an architectural cost model (CM1) to optimise
overall processing time for HWSkel heterogeneous skeletons on a heterogeneous
system composed of networks of arbitrary numbers of nodes, each with an ar-
bitrary number of cores sharing arbitrary amounts of memory. The cost model
characterises the components of the architecture by the number of cores, clock
speed, and crucially the size of the L2 cache. Second, we extend the HWSkel cost
model (CM1) to account for GPU performance. The extended cost model (CM2)
is used in the GPU-HWSkel library to automatically nd a good distribution
for both a single heterogeneous multicore/GPU node, and clusters of heteroge-
neous multicore/GPU nodes. Experiments are carried out on three heterogeneous
multicore clusters, four heterogeneous multicore/GPU clusters, and three single
heterogeneous multicore/GPU nodes. The results of experimental evaluations for
four data parallel benchmarks, i.e. sumEuler, Image matching, Fibonacci, and
Matrix Multiplication, show that our combined heterogeneous skeletons and cost
models can make good use of resources in heterogeneous systems. Moreover using
cores together with a GPU in the same host can deliver good performance either
on a single node or on multiple node architectures
On the Real-Time Performance, Robustness and Accuracy of Medical Image Non-Rigid Registration
Three critical issues about medical image non-rigid registration are performance, robustness and accuracy. A registration method, which is capable of responding timely with an accurate alignment, robust against the variation of the image intensity and the missing data, is desirable for its clinical use. This work addresses all three of these issues. Unacceptable execution time of Non-rigid registration (NRR) often presents a major obstacle to its routine clinical use. We present a hybrid data partitioning method to parallelize a NRR method on a cooperative architecture, which enables us to get closer to the goal: accelerating using architecture rather than designing a parallel algorithm from scratch. to further accelerate the performance for the GPU part, a GPU optimization tool is provided to automatically optimize GPU execution configuration.;Missing data and variation of the intensity are two severe challenges for the robustness of the registration method. A novel point-based NRR method is presented to resolve mapping function (deformation field) with the point correspondence missing. The novelty of this method lies in incorporating a finite element biomechanical model into an Expectation and Maximization (EM) framework to resolve the correspondence and mapping function simultaneously. This method is extended to deal with the deformation induced by tumor resection, which imposes another challenge, i.e. incomplete intra-operative MRI. The registration is formulated as a three variable (Correspondence, Deformation Field, and Resection Region) functional minimization problem and resolved by a Nested Expectation and Maximization framework. The experimental results show the effectiveness of this method in correcting the deformation in the vicinity of the tumor. to deal with the variation of the intensity, two different methods are developed depending on the specific application. For the mono-modality registration on delayed enhanced cardiac MRI and cine MRI, a hybrid registration method is designed by unifying both intensity- and feature point-based metrics into one cost function. The experiment on the moving propagation of suspicious myocardial infarction shows effectiveness of this hybrid method. For the multi-modality registration on MRI and CT, a Mutual Information (MI)-based NRR is developed by modeling the underlying deformation as a Free-Form Deformation (FFD). MI is sensitive to the variation of the intensity due to equidistant bins. We overcome this disadvantage by designing a Top-to-Down K-means clustering method to naturally group similar intensities into one bin. The experiment shows this method can increase the accuracy of the MI-based registration.;In image registration, a finite element biomechanical model is usually employed to simulate the underlying movement of the soft tissue. We develop a multi-tissue mesh generation method to build a heterogeneous biomechanical model to realistically simulate the underlying movement of the brain. We focus on the following four critical mesh properties: tissue-dependent resolution, fidelity to tissue boundaries, smoothness of mesh surfaces, and element quality. Each mesh property can be controlled on a tissue level. The experiments on comparing the homogeneous model with the heterogeneous model demonstrate the effectiveness of the heterogeneous model in improving the registration accuracy
- …