17 research outputs found
Использование разделяемой памяти платформы CUDA в параллельной реализации искусственной нейронной сети прямого распространения
В статье рассматриваются влияние способа использования разделяемой памяти на производительность реализации искусственной нейронной сети на платформе CUDA. Рассматриваются варианты размещения нескольких окон исходных данных и весовых коэффициентов в разделяемой памяти. Показано, что из-за нерационального использования времени ожидания загрузки данных из глобальной памяти производительность этих вариантов не превосходит производительности базовой схемы распараллеливания.The performance of several schemes of shared memory usage in artificial neural network implementation on a CUDA platform is considered. The placement of several windows of input data and neuron inputs weights in shared memory is investigated. It is shown, that due to waiting while data is loaded from global memory, performance of these schemes doesn’t exceed the performance of basic scheme of parallelization
CENTRAL PROCESSING UNIT-GRAPHICS PROCESSING UNIT COMPUTING SCHEME FOR MULTI-OBJECT TRACKING IN SURVEILLANCE
This research work presents a novel central processing unit-graphics processing unit (CPU-GPU) computing scheme for multiple object trackingduring a surveillance operation. This facilitates nonlinear computational jobs to avail completion of computation in minimal processing time for tracking function. The work is divided into two essential objectives. First is to dynamically divide the processing operations into parallel units, and second is to reduce the communication between CPU-GPU processing units
Acceleration of stereo-matching on multi-core CPU and GPU
This paper presents an accelerated version of a
dense stereo-correspondence algorithm for two different parallelism
enabled architectures, multi-core CPU and GPU. The
algorithm is part of the vision system developed for a binocular
robot-head in the context of the CloPeMa 1 research project.
This research project focuses on the conception of a new clothes
folding robot with real-time and high resolution requirements
for the vision system. The performance analysis shows that
the parallelised stereo-matching algorithm has been significantly
accelerated, maintaining 12x and 176x speed-up respectively
for multi-core CPU and GPU, compared with non-SIMD singlethread
CPU. To analyse the origin of the speed-up and gain
deeper understanding about the choice of the optimal hardware,
the algorithm was broken into key sub-tasks and the performance
was tested for four different hardware architectures
Graphics processor unit hardware acceleration of Levenberg-Marquardt artificial neural network training
This paper makes two principal contributions. The first is that there appears to be no previous a description in the research literature of an artificial neural network implementation on a graphics processor unit (GPU) that uses the Levenberg-Marquardt (LM) training method. The second is an initial attempt at determining when it is computationally beneficial to exploit a GPU’s parallel nature in preference to the traditional implementation on a central processing unit (CPU). The paper describes the approach taken to successfully implement the LM method, discusses the advantages of this approach for GPU implementation and presents results that compare GPU and CPU performance on two test data sets
Параллельная обработка потока данных искусственными нейронными сетями на платформе CUDA
The software implementation of a backpropagation artificial neural network (ANN) on massive parallel computing platform CUDA is proposedПредложена схема программной реализации искусственной нейронной сети (ИНС) прямого распространения на платформе массовых параллельных вычислений CUDAЗапропоновано схему програмної реалізації штучної нейронної мережі (ШНМ) прямого розповсюдження на платформі масових паралельних обчислень CUD
Portable GPU-Based Artificial Neural Networks For Data-Driven Modeling
Artificial neural network (ANN) is widely applied as data-driven modeling tool in hydroinformatics due to its broad applicability of handing implicit and nonlinear relationships between the input and output data. To obtain a reliable ANN model, training ANN using the data is essential, but the training is usually taking many hours for large data set and/or for large systems with many variants. This may not be a concern when ANN is trained for offline applications, but it is of great importance when ANN is trained or retrained for real-time and near real-time applications, which are becoming an increasingly interested research theme while the hydroinformatics tools will be an integral part of smart city operation system. Based on author’s previous research projects, which proved that GPU-based ANN is more than 10X efficient than CPU-based ANN for constructing the meta-model (fast simulation), applied as a surrogate of the physics-based model (slow simulation). This paper presents the latest development of GPU-based ANN computing kernels that is implemented with OpenCL an Open Compute Language. The generalized ANN can be used an efficient machine learning library for data-driven modeling. The performance of the implemented library has been tested with the benchmark example and compared with the previous results
Study of Camera Spectral Reflectance Reconstruction Performance using CPU and GPU Artificial Neural Network Modelling
Reconstruction of reflectance spectra from camera RGB values is possible, if characteristics of the illumination source, optics and sensors are known. If not, additional information about these has to be somehow acquired. If alongside with pictures taken, RGB values of some colour patches with known reflectance spectra are obtained under the same illumination conditions, the reflectance reconstruction models can be created based on artificial neural networks (ANN). In Matlab, multilayer feedforward networks can be trained using different algorithms. In our study we hypothesized that the scaled conjugate gradient back propagation (BP) algorithm when executed on Graphics Processing Unit, is very fast, but in terms of convergence and performance, it does not match Levenberg-Marquardt algorithm (LM), which, on the other hand, executes only on CPU and is therefore much more time-consuming. We also presumed that there exists a correlation between the two algorithms and is manifested through a dependency of MSE to the number of hidden layer neurons, and therefore the faster BP algorithm could be used to narrow the search span with the LM algorithm to find the best ANN for reflectance reconstruction. The conducted experiment confirmed speed superiority of the BP algorithm but also confirmed better convergence and accuracy of reflectance reconstruction with the LM algorithm. The correlation of reflectance recovery results with ANNs modelled by both training algorithms was confirmed, and a strong correlation was found between the 3rd order polynomial approximation of the LM and BP algorithm\u27s test performances for both mean and best performance
Acceleration of stereo-matching on multi-core CPU and GPU
This paper presents an accelerated version of a
dense stereo-correspondence algorithm for two different parallelism
enabled architectures, multi-core CPU and GPU. The
algorithm is part of the vision system developed for a binocular
robot-head in the context of the CloPeMa 1 research project.
This research project focuses on the conception of a new clothes
folding robot with real-time and high resolution requirements
for the vision system. The performance analysis shows that
the parallelised stereo-matching algorithm has been significantly
accelerated, maintaining 12x and 176x speed-up respectively
for multi-core CPU and GPU, compared with non-SIMD singlethread
CPU. To analyse the origin of the speed-up and gain
deeper understanding about the choice of the optimal hardware,
the algorithm was broken into key sub-tasks and the performance
was tested for four different hardware architectures