Search CORE

258 research outputs found

Acceleration of k-Nearest Neighbor and SRAD Algorithms Using Intel FPGA SDK for OpenCL

Author: Liu Liyuan
Publication venue: 'University of Windsor Leddy Library'
Publication date: 23/03/2018
Field of study

Field Programmable Gate Arrays (FPGAs) have been widely used for accelerating machine learning algorithms. However, the high design cost and time for implementing FPGA-based accelerators using traditional HDL-based design methodologies has discouraged users from designing FPGA-based accelerators. In recent years, a new CAD tool called Intel FPGA SDK for OpenCL (IFSO) allowed fast and efficient design of FPGA-based hardware accelerators from high level specification such as OpenCL. Even software engineers with basic hardware design knowledge could design FPGA-based accelerators. In this thesis, IFSO has been used to explore acceleration of k-Nearest-Neighbour (kNN) algorithm and Speckle Reducing Anisotropic Diffusion (SRAD) simulation using FPGAs. kNN is a popular algorithm used in machine learning. Bitonic sorting and radix sorting algorithms were used in the kNN algorithm to check if these provide any performance improvements. Acceleration of SRAD simulation was also explored. The experimental results obtained for these algorithms from FPGA-based acceleration were compared with the state of the art CPU implementation. The optimized algorithms were implemented on two different FPGAs (Intel Stratix A7 and Intel Arria 10 GX). Experimental results show that the FPGA-based accelerators provided similar or better execution time (up to 80X) and better power efficiency (75% reduction in power consumption) than traditional platforms such as a workstation based on two Intel Xeon processors E5-2620 Series (each with 6 cores and running at 2.4 GHz)

Scholarship at UWindsor

An Analysis of Variation Between Cores For Intel Xeon Phi Knights Corner And Xeon Phi Knights Landing

Author: Robinson Jamar
Publication venue: Clemson University Libraries
Publication date: 01/05/2017
Field of study

As we move towards exascale computing, the efficiency of application performance and energy utilization, must be optimized by redefining architectural features and application performance analysis. This research analyzes the performance per core of 8 applications on Intel Xeon Phi Knights Corner (KNC) and Knights Landing (KNL) to determine if performance variation within cores can lead to performance and energy improvements. Our results showed that KNC architecture\u27s core vary in performance, leading to faster inner core performance as a result of memory characteristics and core utilization. It also shows that cores 17, 34, and 51 on the KNL architectures performs consistently slower than other cores, with core 0 performing either faster, slower or within the average performance time all the cores. A power performance study was then done utilizing different core configurations on the KNC. The results show that by targeting inner cores for applications that exhibit better inner core performance, a maximum energy reduction of 16.4% compared to a con- figuration using all cores was possible with its optimal thread configuration. Energy reduction was achieved with along with a 2% reduction in the fastest execution time of the same application. Our results also show how application characteristics lead to different core variation performances on KNC and KNL Xeon Phi architectures

Clemson University: TigerPrints

On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the GVirtuS Framework

Author: Ferraro Carmine
Giunta Giulio
Hong Cheol-Ho
Laccetti Giuliano
Lapegna Marco
Montella Rafaelle
Nikolopoulos Dimitrios
Palmieri Carlo
Pelliccia Valentina
Spence Ivor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The astonishing development of diverse and different hardware platforms is twofold: on one side, the challenge for the exascale performance for big data processing and management; on the other side, the mobile and embedded devices for data collection and human machine interaction. This drove to a highly hierarchical evolution of programming models. GVirtuS is the general virtualization system developed in 2009 and firstly introduced in 2010 enabling a completely transparent layer among GPUs and VMs. This paper shows the latest achievements and developments of GVirtuS, now supporting CUDA 6.5, memory management and scheduling. Thanks to the new and improved remoting capabilities, GVirtus now enables GPU sharing among physical and virtual machines based on x86 and ARM CPUs on local workstations, computing clusters and distributed cloud appliances

Queen's University Belfast Research Portal

Archivio della ricerca - Università degli studi di Napoli "Parthenope"

Archivio della ricerca - Università degli studi di Napoli Federico II

Crossref

A performance focused, development friendly and model aided parallelization strategy for scientific applications

Author: Joshi Anagha S.
Publication venue: Clemson University Libraries
Publication date: 01/12/2016
Field of study

The amelioration of high performance computing platforms has provided unprecedented computing power with the evolution of multi-core CPUs, massively parallel architectures such as General Purpose Graphics Processing Units (GPGPUs) and Many Integrated Core (MIC) architectures such as Intel\u27s Xeon phi coprocessor. However, it is a great challenge to leverage capabilities of such advanced supercomputing hardware, as it requires efficient and effective parallelization of scientific applications. This task is difficult mainly due to complexity of scientific algorithms coupled with the variety of available hardware and disparate programming models. To address the aforementioned challenges, this thesis presents a parallelization strategy to accelerate scientific applications that maximizes the opportunities of achieving speedup while minimizing the development efforts. Parallelization is a three step process (1) choose a compatible combination of architecture and parallel programming language, (2) translate base code/algorithm to a parallel language and (3) optimize and tune the application. In this research, a quantitative comparison of run time for various implementations of k-means algorithm, is used to establish that native languages (OpenMP, MPI, CUDA) perform better on respective architectures as opposed to vendor-neutral languages such as OpenCL. A qualitative model is used to select an optimal architecture for a given application by aligning the capabilities of accelerators with characteristics of the application. Once the optimal architecture is chosen, the corresponding native language is employed. This approach provides the best performance with reasonable accuracy (78%) of predicting a fitting combination, while eliminating the need for exploring different architectures individually. It reduces the required development efforts considerably as the application need not be re-written in multiple languages. The focus can be solely on optimization and tuning to achieve the best performance on available architectures with minimized investment in terms of cost and efforts. To verify the prediction accuracy of the qualitative model, the OpenDwarfs benchmark suite, which implements the Berkeley\u27s dwarfs in OpenCL, is used. A dwarf is an algorithmic method that captures a pattern of computation and communication. For the purpose of this research, the focus is on 9 application from various algorithmic domains that cover the seven dwarfs of symbolic computation, which were identified by Phillip Colella, as omnipresent in scientific and engineering applications. To validate the parallelization strategy collectively, a case study is undertaken. This case study involves parallelization of the Lower Upper Decomposition for the Gaussian Elimination algorithm from the linear algebra domain, using conventional trial and error methods as well as the proposed \u27Architecture First, Language Later\u27\u27 strategy. The development efforts incurred are contrasted for both methods. The aforesaid proposed strategy is observed to reduce the development efforts by an average of 50%

Clemson University: TigerPrints

Alternative Processor within Threshold: Flexible Scheduling on Heterogeneous Systems

Author: Karia Stavan Satish
Publication venue: RIT Scholar Works
Publication date: 01/03/2017
Field of study

Computing systems have become increasingly heterogeneous contributing to higher performance and power efficiency. However, this is at the cost of increasing the overall complexity of designing such systems. One key challenge in the design of heterogeneous systems is the efficient scheduling of computational load. To address this challenge, this paper thoroughly analyzes state of the art scheduling policies and proposes a new dynamic scheduling heuristic: Alternative Processor within Threshold (APT). This heuristic uses a flexibility factor to attain efficient usage of the available hardware resources, taking advantage of the degree of heterogeneity of the system. In a GPU-CPU-FPGA system, tested on workloads with and without data dependencies, this approach improved overall execution time by 16% and 18% when compared to the second-best heuristic

RIT Scholar Works

Radial Basis Functions: Biomedical Applications and Parallelization

Author: Liu Ke
Publication venue: UWM Digital Commons
Publication date: 01/12/2016
Field of study

Radial basis function (RBF) is a real-valued function whose values depend only on the distances between an interpolation point and a set of user-specified points called centers. RBF interpolation is one of the primary methods to reconstruct functions from multi-dimensional scattered data. Its abilities to generalize arbitrary space dimensions and to provide spectral accuracy have made it particularly popular in different application areas, including but not limited to: finding numerical solutions of partial differential equations (PDEs), image processing, computer vision and graphics, deep learning and neural networks, etc. The present thesis discusses three applications of RBF interpolation in biomedical engineering areas: (1) Calcium dynamics modeling, in which we numerically solve a set of PDEs by using meshless numerical methods and RBF-based interpolation techniques; (2) Image restoration and transformation, where an image is restored from its triangular mesh representation or transformed under translation, rotation, and scaling, etc. from its original form; (3) Porous structure design, in which the RBF interpolation used to reconstruct a 3D volume containing porous structures from a set of regularly or randomly placed points inside a user-provided surface shape. All these three applications have been investigated and their effectiveness has been supported with numerous experimental results. In particular, we innovatively utilize anisotropic distance metrics to define the distance in RBF interpolation and apply them to the aforementioned second and third applications, which show significant improvement in preserving image features or capturing connected porous structures over the isotropic distance-based RBF method. Beside the algorithm designs and their applications in biomedical areas, we also explore several common parallelization techniques (including OpenMP and CUDA-based GPU programming) to accelerate the performance of the present algorithms. In particular, we analyze how parallel programming can help RBF interpolation to speed up the meshless PDE solver as well as image processing. While RBF has been widely used in various science and engineering fields, the current thesis is expected to trigger some more interest from computational scientists or students into this fast-growing area and specifically apply these techniques to biomedical problems such as the ones investigated in the present work

University of Wisconsin-Milwaukee

X-ray Micro-Tomography and Volumetric Strain Measurement in the Intervertebral Disc

Author: Disney Catherine
Publication venue
Publication date: 01/08/2019
Field of study

The University of Manchester - Institutional Repository

On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the GVirtuS Framework

Author: A Herrera
C Shuai
C Yang
Carlo Palmieri
Carmine Ferraro
Cheol-Ho Hong
Dimitrios S. Nikolopoulos
G Giunta
Giuliano Laccetti
Giulio Giunta
Ivor Spence
L Dagum
L Shi
M Garland
Marco Lapegna
N Rajovic
R Montella
R Montella
Raffaele Montella
SA Manavski
Valentina Pelliccia
Y Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Recommended from our members

Engineering Technology Reports, Volume 1: Laboratory Directed Research and Development FY00

Author: Baron A. L.
Langland R. T.
Minichino C.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 03/10/2001
Field of study

In FY-2000, Engineering at Lawrence Livermore National Laboratory faced significant pressures to meet critical project milestones, and immediate demands to facilitate the reassignment of employees as the National Ignition Facility (the 600-TW laser facility being designed and built at Livermore, and one of the largest R&D construction projects in the world) was in the process of re-baselining its plan while executing full-speed its technology development efforts. This drive for change occurred as an unprecedented level of management and program changes were occurring within LLNL. I am pleased to report that we met many key milestones and achieved numerous technological breakthroughs. This report summarizes our efforts to perform feasibility and reduce-to-practice studies, demonstrations, and/or techniques--as structured through our technology centers. Whether using computational engineering to predict how giant structures like suspension bridges will respond to massive earthquakes or devising a suitcase-sized microtool to detect chemical and biological agents used by terrorists, we have made solid technical progress. Five Centers focus and guide longer-term investments within Engineering, as well as impact all of LLNL. Each Center is responsible for the vitality and growth of the core technologies it represents. My goal is that each Center will be recognized on an international scale for solving compelling national problems requiring breakthrough innovation. The Centers and their leaders are as follows: Center for Complex Distributed Systems--David B. McCallen; Center for Computational Engineering--Kyran D. Mish; Center for Microtechnology--Raymond P. Mariella, Jr.; Center for Nondestructive Characterization--Harry E. Martz, Jr.; and Center for Precision Engineering--Keith Carlisle

UNT Digital Library