1,388 research outputs found
Somoclu: An Efficient Parallel Library for Self-Organizing Maps
Somoclu is a massively parallel tool for training self-organizing maps on
large data sets written in C++. It builds on OpenMP for multicore execution,
and on MPI for distributing the workload across the nodes in a cluster. It is
also able to boost training by using CUDA if graphics processing units are
available. A sparse kernel is included, which is useful for high-dimensional
but sparse data, such as the vector spaces common in text mining workflows.
Python, R and MATLAB interfaces facilitate interactive use. Apart from fast
execution, memory use is highly optimized, enabling training large emergent
maps even on a single computer.Comment: 26 pages, 9 figures. The code is available at
https://peterwittek.github.io/somoclu
Simulation of Recognizer P Systems by Using Manycore GPUs
Software development for cellular computing is growing up yielding new
applications. In this paper, we describe a simulator for the class of recognizer P systems
with active membranes, which exploits the massively parallel nature of the P systems
computations by using a massively parallel computer architecture, such as Compute
Unified Device Architecture (CUDA) from Nvidia, to obtain better performance in the
simulations. We illustrate it by giving a solution to the N-Queens problem as an example.Ministerio de Educación y Ciencia TIN2006–13425Junta de Andalucía P08–TIC0420
Scalability Validation of Parallel Sorting Algorithms
As single-core performance of processors is not improving significantly anymore, the computer industry is moving towards increasing the amount of cores per processor or, in the case of large-scale computers, by installing more processors per computer. Applications now need to scale in accordance with the increase of parallel computing power and software developers need to take advantage of this movement. And parallel sorting algorithms present basic building blocks for many complex applications.
In this thesis, we will validate the expected execution time complexities of five state-of-the-art parallel sorting algorithms, implemented in C using MPI for parallelization, by using a scalability validation framework based on Score-P and Extra-P. For each of the parallel sorting algorithms, we will create a performance model. These models will allow us to compare their scalability behaviour to the expectations. Furthermore, we will attempt to parallelize the local sorting step of the splitter-based parallel sorting algorithms via C++11 threads, OpenMP tasks, and CUDA acceleration.
We construct the performance models, on which we base our evaluations, using uniformly randomly generated data. For most of the parallel sorting algorithms, we show that the given expectations match the created models. We will discuss any other discrepancies in detail
Fast, Scalable, and Interactive Software for Landau-de Gennes Numerical Modeling of Nematic Topological Defects
Numerical modeling of nematic liquid crystals using the tensorial Landau-de
Gennes (LdG) theory provides detailed insights into the structure and
energetics of the enormous variety of possible topological defect
configurations that may arise when the liquid crystal is in contact with
colloidal inclusions or structured boundaries. However, these methods can be
computationally expensive, making it challenging to predict (meta)stable
configurations involving several colloidal particles, and they are often
restricted to system sizes well below the experimental scale. Here we present
an open-source software package that exploits the embarrassingly parallel
structure of the lattice discretization of the LdG approach. Our
implementation, combining CUDA/C++ and OpenMPI, allows users to accelerate
simulations using both CPU and GPU resources in either single- or multiple-core
configurations. We make use of an efficient minimization algorithm, the Fast
Inertial Relaxation Engine (FIRE) method, that is well-suited to large-scale
parallelization, requiring little additional memory or computational cost while
offering performance competitive with other commonly used methods. In
multi-core operation we are able to scale simulations up to supra-micron length
scales of experimental relevance, and in single-core operation the simulation
package includes a user-friendly GUI environment for rapid prototyping of
interfacial features and the multifarious defect states they can promote. To
demonstrate this software package, we examine in detail the competition between
curvilinear disclinations and point-like hedgehog defects as size scale,
material properties, and geometric features are varied. We also study the
effects of an interface patterned with an array of topological point-defects.Comment: 16 pages, 6 figures, 1 youtube link. The full catastroph
GPU Accelerated Particle Visualization with Splotch
Splotch is a rendering algorithm for exploration and visual discovery in
particle-based datasets coming from astronomical observations or numerical
simulations. The strengths of the approach are production of high quality
imagery and support for very large-scale datasets through an effective mix of
the OpenMP and MPI parallel programming paradigms. This article reports our
experiences in re-designing Splotch for exploiting emerging HPC architectures
nowadays increasingly populated with GPUs. A performance model is introduced
for data transfers, computations and memory access, to guide our re-factoring
of Splotch. A number of parallelization issues are discussed, in particular
relating to race conditions and workload balancing, towards achieving optimal
performances. Our implementation was accomplished by using the CUDA programming
paradigm. Our strategy is founded on novel schemes achieving optimized data
organisation and classification of particles. We deploy a reference simulation
to present performance results on acceleration gains and scalability. We
finally outline our vision for future work developments including possibilities
for further optimisations and exploitation of emerging technologies.Comment: 25 pages, 9 figures. Astronomy and Computing (2014
Implementing P Systems Parallelism by Means of GPUs
Software development for Membrane Computing is growing
up yielding new applications. Nowadays, the efficiency of P systems simulators
have become a critical point when working with instances of large
size. The newest generation of GPUs (Graphics Processing Units) provide
a massively parallel framework to compute general purpose computations.
We present GPUs as an alternative to obtain better performance
in the simulation of P systems and we illustrate it by giving a solution
to the N-Queens problem as an example.Ministerio de Educación y Ciencia TIN2006-13425Junta de Andalucía P08–TIC-0420
GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems
While many of the architectural details of future exascale-class high
performance computer systems are still a matter of intense research, there
appears to be a general consensus that they will be strongly heterogeneous,
featuring "standard" as well as "accelerated" resources. Today, such resources
are available as multicore processors, graphics processing units (GPUs), and
other accelerators such as the Intel Xeon Phi. Any software infrastructure that
claims usefulness for such environments must be able to meet their inherent
challenges: massive multi-level parallelism, topology, asynchronicity, and
abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a
collection of building blocks that targets algorithms dealing with sparse
matrix representations on current and future large-scale systems. It implements
the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel
numerical kernels, intelligent resource management, and truly heterogeneous
parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We
describe the details of its design with respect to the challenges posed by
modern heterogeneous supercomputers and recent algorithmic developments.
Implementation details which are indispensable for achieving high efficiency
are pointed out and their necessity is justified by performance measurements or
predictions based on performance models. The library code and several
applications are available as open source. We also provide instructions on how
to make use of GHOST in existing software packages, together with a case study
which demonstrates the applicability and performance of GHOST as a component
within a larger software stack.Comment: 32 pages, 11 figure
- …