Search CORE

1,388 research outputs found

Somoclu: An Efficient Parallel Library for Self-Organizing Maps

Author: Gao Shi Chao
Lim Ik Soo
Wittek Peter
Zhao Li
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/01/2017
Field of study

Somoclu is a massively parallel tool for training self-organizing maps on large data sets written in C++. It builds on OpenMP for multicore execution, and on MPI for distributing the workload across the nodes in a cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. Python, R and MATLAB interfaces facilitate interactive use. Apart from fast execution, memory use is highly optimized, enabling training large emergent maps even on a single computer.Comment: 26 pages, 9 figures. The code is available at https://peterwittek.github.io/somoclu

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Journal of Statistical Software

Bangor University Research Portal

Simulation of Recognizer P Systems by Using Manycore GPUs

Author: Cecilia José M.
García José M.
Guerrero Ginés D.
Martínez del Amor Miguel Ángel
Pérez Hurtado de Mendoza Ignacio
Pérez Jiménez Mario de Jesús
Publication venue: Fénix Editora
Publication date: 01/01/2009
Field of study

Software development for cellular computing is growing up yielding new applications. In this paper, we describe a simulator for the class of recognizer P systems with active membranes, which exploits the massively parallel nature of the P systems computations by using a massively parallel computer architecture, such as Compute Unified Device Architecture (CUDA) from Nvidia, to obtain better performance in the simulations. We illustrate it by giving a solution to the N-Queens problem as an example.Ministerio de Educación y Ciencia TIN2006–13425Junta de Andalucía P08–TIC0420

idUS. Depósito de Investigación Universidad de Sevilla

Scalability Validation of Parallel Sorting Algorithms

Author: Berens Yannick
Publication venue
Publication date: 16/10/2017
Field of study

As single-core performance of processors is not improving significantly anymore, the computer industry is moving towards increasing the amount of cores per processor or, in the case of large-scale computers, by installing more processors per computer. Applications now need to scale in accordance with the increase of parallel computing power and software developers need to take advantage of this movement. And parallel sorting algorithms present basic building blocks for many complex applications. In this thesis, we will validate the expected execution time complexities of five state-of-the-art parallel sorting algorithms, implemented in C using MPI for parallelization, by using a scalability validation framework based on Score-P and Extra-P. For each of the parallel sorting algorithms, we will create a performance model. These models will allow us to compare their scalability behaviour to the expectations. Furthermore, we will attempt to parallelize the local sorting step of the splitter-based parallel sorting algorithms via C++11 threads, OpenMP tasks, and CUDA acceleration. We construct the performance models, on which we base our evaluations, using uniformly randomly generated data. For most of the parallel sorting algorithms, we show that the given expectations match the created models. We will discuss any other discrepancies in detail

tuprints

Fast, Scalable, and Interactive Software for Landau-de Gennes Numerical Modeling of Nematic Topological Defects

Author: Beller Daniel A
Sussman Daniel M
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Numerical modeling of nematic liquid crystals using the tensorial Landau-de Gennes (LdG) theory provides detailed insights into the structure and energetics of the enormous variety of possible topological defect configurations that may arise when the liquid crystal is in contact with colloidal inclusions or structured boundaries. However, these methods can be computationally expensive, making it challenging to predict (meta)stable configurations involving several colloidal particles, and they are often restricted to system sizes well below the experimental scale. Here we present an open-source software package that exploits the embarrassingly parallel structure of the lattice discretization of the LdG approach. Our implementation, combining CUDA/C++ and OpenMPI, allows users to accelerate simulations using both CPU and GPU resources in either single- or multiple-core configurations. We make use of an efficient minimization algorithm, the Fast Inertial Relaxation Engine (FIRE) method, that is well-suited to large-scale parallelization, requiring little additional memory or computational cost while offering performance competitive with other commonly used methods. In multi-core operation we are able to scale simulations up to supra-micron length scales of experimental relevance, and in single-core operation the simulation package includes a user-friendly GUI environment for rapid prototyping of interfacial features and the multifarious defect states they can promote. To demonstrate this software package, we examine in detail the competition between curvilinear disclinations and point-like hedgehog defects as size scale, material properties, and geometric features are varied. We also study the effects of an interface patterned with an array of topological point-defects.Comment: 16 pages, 6 figures, 1 youtube link. The full catastroph

arXiv.org e-Print Archive

eScholarship - University of California

GPU Accelerated Particle Visualization with Splotch

Author: Dolag Klaus
Dykes Tim
Gheller Claudio
Krokos Mel
Rivi Marzia
Publication venue: 'Elsevier BV'
Publication date: 23/03/2014
Field of study

Splotch is a rendering algorithm for exploration and visual discovery in particle-based datasets coming from astronomical observations or numerical simulations. The strengths of the approach are production of high quality imagery and support for very large-scale datasets through an effective mix of the OpenMP and MPI parallel programming paradigms. This article reports our experiences in re-designing Splotch for exploiting emerging HPC architectures nowadays increasingly populated with GPUs. A performance model is introduced for data transfers, computations and memory access, to guide our re-factoring of Splotch. A number of parallelization issues are discussed, in particular relating to race conditions and workload balancing, towards achieving optimal performances. Our implementation was accomplished by using the CUDA programming paradigm. Our strategy is founded on novel schemes achieving optimized data organisation and classification of particles. We deploy a reference simulation to present performance results on acceleration gains and scalability. We finally outline our vision for future work developments including possibilities for further optimisations and exploitation of emerging technologies.Comment: 25 pages, 9 figures. Astronomy and Computing (2014

arXiv.org e-Print Archive

Portsmouth University Research Portal (Pure)

Implementing P Systems Parallelism by Means of GPUs

Author: Cecilia José M.
García José M.
Guerrero Ginés D.
Martínez del Amor Miguel Ángel
Pérez Hurtado de Mendoza Ignacio
Pérez Jiménez Mario de Jesús
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Software development for Membrane Computing is growing up yielding new applications. Nowadays, the efficiency of P systems simulators have become a critical point when working with instances of large size. The newest generation of GPUs (Graphics Processing Units) provide a massively parallel framework to compute general purpose computations. We present GPUs as an alternative to obtain better performance in the simulation of P systems and we illustrate it by giving a solution to the N-Queens problem as an example.Ministerio de Educación y Ciencia TIN2006-13425Junta de Andalucía P08–TIC-0420

idUS. Depósito de Investigación Universidad de Sevilla

GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems

Author: Basermann Achim
Fehske Holger
Galgon Martin
Hager Georg
Kreutzer Moritz
Pieper Andreas
Röhrig-Zöllner Melven
Shahzad Faisal
Thies Jonas
Wellein Gerhard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

While many of the architectural details of future exascale-class high performance computer systems are still a matter of intense research, there appears to be a general consensus that they will be strongly heterogeneous, featuring "standard" as well as "accelerated" resources. Today, such resources are available as multicore processors, graphics processing units (GPUs), and other accelerators such as the Intel Xeon Phi. Any software infrastructure that claims usefulness for such environments must be able to meet their inherent challenges: massive multi-level parallelism, topology, asynchronicity, and abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a collection of building blocks that targets algorithms dealing with sparse matrix representations on current and future large-scale systems. It implements the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel numerical kernels, intelligent resource management, and truly heterogeneous parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We describe the details of its design with respect to the challenges posed by modern heterogeneous supercomputers and recent algorithmic developments. Implementation details which are indispensable for achieving high efficiency are pointed out and their necessity is justified by performance measurements or predictions based on performance models. The library code and several applications are available as open source. We also provide instructions on how to make use of GHOST in existing software packages, together with a case study which demonstrates the applicability and performance of GHOST as a component within a larger software stack.Comment: 32 pages, 11 figure

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref