Search CORE

520 research outputs found

Somoclu: An Efficient Parallel Library for Self-Organizing Maps

Author: Gao Shi Chao
Lim Ik Soo
Wittek Peter
Zhao Li
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/01/2017
Field of study

Somoclu is a massively parallel tool for training self-organizing maps on large data sets written in C++. It builds on OpenMP for multicore execution, and on MPI for distributing the workload across the nodes in a cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. Python, R and MATLAB interfaces facilitate interactive use. Apart from fast execution, memory use is highly optimized, enabling training large emergent maps even on a single computer.Comment: 26 pages, 9 figures. The code is available at https://peterwittek.github.io/somoclu

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Journal of Statistical Software

Bangor University Research Portal

A Multi-signal Variant for the GPU-based Parallelization of Growing Self-Organizing Networks

Author: I Buck
J García-Rodríguez
J Owens
J Stone
J. Owens
M Papakipos
M Piastra
N Amenta
R Lawrence
RW Hockney
S Marsland
S Orts
T Martinetz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/03/2015
Field of study

Among the many possible approaches for the parallelization of self-organizing networks, and in particular of growing self-organizing networks, perhaps the most common one is producing an optimized, parallel implementation of the standard sequential algorithms reported in the literature. In this paper we explore an alternative approach, based on a new algorithm variant specifically designed to match the features of the large-scale, fine-grained parallelism of GPUs, in which multiple input signals are processed at once. Comparative tests have been performed, using both parallel and sequential implementations of the new algorithm variant, in particular for a growing self-organizing network that reconstructs surfaces from point clouds. The experimental results show that this approach allows harnessing in a more effective way the intrinsic parallelism that the self-organizing networks algorithms seem intuitively to suggest, obtaining better performances even with networks of smaller size.Comment: 17 page

arXiv.org e-Print Archive

Crossref

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia

A Self Organization-Based Optical Flow Estimator with GPU Implementation

Author: Shiralkar Manish
Publication venue: Clemson University Libraries
Publication date: 01/12/2010
Field of study

This work describes a parallelizable optical flow estimator that uses a modified batch version of the Self Organizing Map (SOM). This gradient-based estimator handles the ill-posedness in motion estimation via a novel combination of regression and a self organization strategy. The aperture problem is explicitly modeled using an algebraic framework that partitions motion estimates obtained from regression into two sets, one (set Hc) with estimates with high confidence and another (set Hp) with low confidence estimates. The self organization step uses a uniquely designed pair of training set (Q=Hc) and the initial weights set (W=Hc U Hp). It is shown that with this specific choice of training and initial weights sets, the interpolation of flow vectors is achieved primarily due to the regularization property of SOM. Moreover, the computationally involved step of finding the winner unit in SOM simplifies to indexing into a 2D array making the algorithm parallelizable and highly scalable. To preserve flow discontinuities at occlusion boundaries, we have designed anisotropic neighborhood function for SOM that uses a novel OFCE residual-based distance measure. A multi-resolution or pyramidal approach is used to estimate large motion. As the algorithm is scalable, with sufficient number of computing cores (for example on a GPU), the implementation of the estimator can be made real-time. With the available true motion from Middlebury database, error metrics are computed

Clemson University: TigerPrints

XPySom: High-performance self-organizing maps

Author: Cucinotta T.
Lanciano G.
Mancini R.
Ritacco A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

In this paper, we introduce XPySom, a new open-source Python implementation of the well-known Self-Organizing Maps (SOM) technique. It is designed to achieve high performance on a single node, exploiting widely available Python libraries for vector processing on multi-core CPUs and GP-GPUs. We present results from an extensive experimental evaluation of XPySom in comparison to widely used open-source SOM implementations, showing that it outperforms the other available alternatives. Indeed, our experimentation carried out using the Extended MNIST open data set shows a speed-up of about 7x and 100x when compared to the best open-source multi-core implementations we could find with multi-core and GP-GPU acceleration, respectively, achieving the same accuracy levels in terms of quantization error

Archivio della ricerca della Scuola Superiore Sant'Anna

Enhancing Performance of Parallel Self-Organizing Map on Large Dataset with Dynamic Parallel and Hyper-Q

Author: Nasution Mahyuddin K.M.
Sibero Alexander F.K.
Sitompul Opim Salim
Publication venue: Talenta Publisher
Publication date: 03/08/2018
Field of study

Self-Organizing Map (SOM) is an unsupervised artificial neural network algorithm. Even though this algorithm is known to be an appealing clustering method,many efforts to improve its performance are still pursued in various research works. In order to gain faster computation time, for instance, running SOM in parallel had been focused in many previous research works. Utilization of the Graphics Processing Unit (GPU) as a parallel calculation engine is also continuously improved. However, total computation time in parallel SOM is still not optimal on processing large dataset. In this research, we propose a combination of Dynamic Parallel and Hyper-Q to further improve the performance of parallel SOM in terms of faster computing time. Dynamic Parallel and Hyper-Q are utilized on the process of calculating distance and searching best-matching unit (BMU), while updating weight and its neighbors are performed using Hyper-Q only. Result of this study indicates an increase in SOM parallel performance up to two times faster compared to those without using Dynamic Parallel and Hyper-Q

Talenta Publisher (E-Journals, Universitas Sumatera Utara)

Distributed learning of CNNs on heterogeneous CPU/GPU architectures

Author: Alexandre Luís A.
Falcao Gabriel
Marques Jose
Publication venue
Publication date: 07/12/2017
Field of study

Convolutional Neural Networks (CNNs) have shown to be powerful classification tools in tasks that range from check reading to medical diagnosis, reaching close to human perception, and in some cases surpassing it. However, the problems to solve are becoming larger and more complex, which translates to larger CNNs, leading to longer training times that not even the adoption of Graphics Processing Units (GPUs) could keep up to. This problem is partially solved by using more processing units and distributed training methods that are offered by several frameworks dedicated to neural network training. However, these techniques do not take full advantage of the possible parallelization offered by CNNs and the cooperative use of heterogeneous devices with different processing capabilities, clock speeds, memory size, among others. This paper presents a new method for the parallel training of CNNs that can be considered as a particular instantiation of model parallelism, where only the convolutional layer is distributed. In fact, the convolutions processed during training (forward and backward propagation included) represent from

60

90

\% of global processing time. The paper analyzes the influence of network size, bandwidth, batch size, number of devices, including their processing capabilities, and other parameters. Results show that this technique is capable of diminishing the training time without affecting the classification performance for both CPUs and GPUs. For the CIFAR-10 dataset, using a CNN with two convolutional layers, and

500

and

1500

kernels, respectively, best speedups achieve

3.28\times

using four CPUs and

2.45\times

with three GPUs. Modern imaging datasets, larger and more complex than CIFAR-10 will certainly require more than

60

90

\% of processing time calculating convolutions, and speedups will tend to increase accordingly

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

UBibliorum repositorio digital da ubi

Directory of Open Access Journals

A CUDA-powered method for the feature extraction and unsupervised analysis of medical images

Author: Cazzaniga P
Galimberti S
Mauri G
Mistri M
Nobile MS
Rundo L
Sala E
Tangherloni A
Woitek R
Publication venue: Journal of Supercomputing
Publication date: 01/01/2021
Field of study

Funder: Università degli Studi di Milano - BicoccaAbstractImage texture extraction and analysis are fundamental steps in computer vision. In particular, considering the biomedical field, quantitative imaging methods are increasingly gaining importance because they convey scientifically and clinically relevant information for prediction, prognosis, and treatment response assessment. In this context, radiomic approaches are fostering large-scale studies that can have a significant impact in the clinical practice. In this work, we present a novel method, called CHASM (Cuda, HAralick & SoM), which is accelerated on the graphics processing unit (GPU) for quantitative imaging analyses based on Haralick features and on the self-organizing map (SOM). The Haralick features extraction step relies upon the gray-level co-occurrence matrix, which is computationally burdensome on medical images characterized by a high bit depth. The downstream analyses exploit the SOM with the goal of identifying the underlying clusters of pixels in an unsupervised manner. CHASM is conceived to leverage the parallel computation capabilities of modern GPUs. Analyzing ovarian cancer computed tomography images, CHASM achieved up to

\sim 19.5\times

∼ 19.5 × and

\sim 37\times

∼ 37 × speed-up factors for the Haralick feature extraction and for the SOM execution, respectively, compared to the corresponding C++ coded sequential versions. Such computational results point out the potential of GPUs in the clinical research.</jats:p

Archivio istituzionale della Ricerca - Bocconi

Pure OAI Repository

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Apollo (Cambridge)

Real-Time Human Detection Using Deep Learning on Embedded Platforms: A Review

Author: Hernawan Ari
Rahmaniar Wahyu
Publication venue: 'Universitas Muhammadiyah Yogyakarta'
Publication date: 08/11/2021
Field of study

The detection of an object such as a human is very important for image understanding in the field of computer vision. Human detection in images can provide essential information for a wide variety of applications in intelligent systems. In this paper, human detection is carried out using deep learning that has developed rapidly and achieved extraordinary success in various object detection implementations. Recently, several embedded systems have emerged as powerful computing boards to provide high processing capabilities using the graphics processing unit (GPU). This paper aims to provide a comprehensive survey of the latest achievements in this field brought about by deep learning techniques in the embedded platforms. NVIDIA Jetson was chosen as a low power system designed to accelerate deep learning applications. This review highlights the performance of human detection models such as PedNet, multiped, SSD MobileNet V1, SSD MobileNet V2, and SSD inception V2 on edge computing. This survey aims to provide an overview of these methods and compare their performance in accuracy and computation time for real-time applications. The experimental results show that the SSD MobileNet V2 model provides the highest accuracy with the fastest computation time compared to other models in our video datasets with several scenarios

Leading & Enlightening Journal UMY

Parallel bio-inspired methods for model optimization and pattern recognition

Author: S. G. Nashed Youssef
Publication venue: Università di Parma. Dipartimento di Ingegneria dell’Informazione.
Publication date: 01/01/2014
Field of study

Nature based computational models are usually inherently parallel. The collaborative intelligence in those models emerges from the simultaneous instruction processing by simple independent units (neurons, ants, swarm members, etc...). This dissertation investigates the benefits of such parallel models in terms of efficiency and accuracy. First, the viability of a parallel implementation of bio-inspired metaheuristics for function optimization on consumer-level graphic cards is studied in detail. Then, in an effort to expose those parallel methods to the research community, the metaheuristic implementations were abstracted and grouped in an open source parameter/function optimization library libCudaOptimize. The library was verified against a well known benchmark for mathematical function minimization, and showed significant gains in both execution time and minimization accuracy. Crossing more into the application side, a parallel model of the human neocortex was developed. This model is able to detect, classify, and predict patterns in time-series data in an unsupervised way. Finally, libCudaOptimize was used to find the best parameters for this neocortex model, adapting it to gesture recognition within publicly available datasets

DSpace a Parma

REAL-TIME DATA MINING FOR PROCESS OPERATIONS USING GRAPHICS PROCESSING UNIT (GPU)-BASED HIGH PERFORMANCE COMPUTING

Author: LAU MAI CHAN
Publication venue
Publication date: 11/08/2014
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS