5,436 research outputs found
Dependability Analysis of Control Systems using SystemC and Statistical Model Checking
Stochastic Petri nets are commonly used for modeling distributed systems in
order to study their performance and dependability. This paper proposes a
realization of stochastic Petri nets in SystemC for modeling large embedded
control systems. Then statistical model checking is used to analyze the
dependability of the constructed model. Our verification framework allows users
to express a wide range of useful properties to be verified which is
illustrated through a case study
PULP-HD: Accelerating Brain-Inspired High-Dimensional Computing on a Parallel Ultra-Low Power Platform
Computing with high-dimensional (HD) vectors, also referred to as
, is a brain-inspired alternative to computing with
scalars. Key properties of HD computing include a well-defined set of
arithmetic operations on hypervectors, generality, scalability, robustness,
fast learning, and ubiquitous parallel operations. HD computing is about
manipulating and comparing large patterns-binary hypervectors with 10,000
dimensions-making its efficient realization on minimalistic ultra-low-power
platforms challenging. This paper describes HD computing's acceleration and its
optimization of memory accesses and operations on a silicon prototype of the
PULPv3 4-core platform (1.5mm, 2mW), surpassing the state-of-the-art
classification accuracy (on average 92.4%) with simultaneous 3.7
end-to-end speed-up and 2 energy saving compared to its single-core
execution. We further explore the scalability of our accelerator by increasing
the number of inputs and classification window on a new generation of the PULP
architecture featuring bit-manipulation instruction extensions and larger
number of 8 cores. These together enable a near ideal speed-up of 18.4
compared to the single-core PULPv3
Cost-effective HPC clustering for computer vision applications
We will present a cost-effective and flexible realization of high performance computing (HPC) clustering and its potential in solving computationally intensive problems in computer vision. The featured software foundation to support the parallel programming is the GNU parallel Knoppix package with message passing interface (MPI) based Octave, Python and C interface capabilities. The implementation is especially of interest in applications where the main objective is to reuse the existing hardware infrastructure and to maintain the overall budget cost. We will present the benchmark results and compare and contrast the performances of Octave and MATLAB
Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM
We explore the utilization of the Apache TVM open source framework to
automatically generate a family of algorithms that follow the approach taken by
popular linear algebra libraries, such as GotoBLAS2, BLIS and OpenBLAS, in
order to obtain high-performance blocked formulations of the general matrix
multiplication (GEMM). % In addition, we fully automatize the generation
process, by also leveraging the Apache TVM framework to derive a complete
variety of the processor-specific micro-kernels for GEMM. This is in contrast
with the convention in high performance libraries, which hand-encode a single
micro-kernel per architecture using Assembly code. % In global, the combination
of our TVM-generated blocked algorithms and micro-kernels for GEMM 1)~improves
portability, maintainability and, globally, streamlines the software life
cycle; 2)~provides high flexibility to easily tailor and optimize the solution
to different data types, processor architectures, and matrix operand shapes,
yielding performance on a par (or even superior for specific matrix shapes)
with that of hand-tuned libraries; and 3)~features a small memory footprint.Comment: 35 pages, 22 figures. Submitted to ACM TOM
A High-Performance Implementation of Atomistic Spin Dynamics Simulations on x86 CPUs
Atomistic spin dynamics simulations provide valuable information about the
energy spectrum of magnetic materials in different phases, allowing one to
identify instabilities and the nature of their excitations. However, the time
cost of evaluating the dynamical correlation function
increases quadratically as the number of spins , leading to significant
computational effort, making the simulation of large spin systems very
challenging. In this work, we propose to use a highly optimized general matrix
multiply (GEMM) subroutine to calculate the dynamical spin-spin correlation
function that can achieve near-optimal hardware utilization. Furthermore, we
fuse the element-wise operations in the calculation of into
the in-house GEMM kernel, which results in further performance improvements of
44\% - 71\% on several relatively large lattice sizes when compared to the
implementation that uses the GEMM subroutine in OpenBLAS, which is the
state-of-the-art open source library for Basic Linear Algebra Subroutine
(BLAS).Comment: 18 (short) pages, 6 figure
ARKCoS: Artifact-Suppressed Accelerated Radial Kernel Convolution on the Sphere
We describe a hybrid Fourier/direct space convolution algorithm for compact
radial (azimuthally symmetric) kernels on the sphere. For high resolution maps
covering a large fraction of the sky, our implementation takes advantage of the
inexpensive massive parallelism afforded by consumer graphics processing units
(GPUs). Applications involve modeling of instrumental beam shapes in terms of
compact kernels, computation of fine-scale wavelet transformations, and optimal
filtering for the detection of point sources. Our algorithm works for any
pixelization where pixels are grouped into isolatitude rings. Even for kernels
that are not bandwidth limited, ringing features are completely absent on an
ECP grid. We demonstrate that they can be highly suppressed on the popular
HEALPix pixelization, for which we develop a freely available implementation of
the algorithm. As an example application, we show that running on a high-end
consumer graphics card our method speeds up beam convolution for simulations of
a characteristic Planck high frequency instrument channel by two orders of
magnitude compared to the commonly used HEALPix implementation on one CPU core
while maintaining at typical a fractional RMS accuracy of about 1 part in 10^5.Comment: 10 pages, 6 figures. Submitted to Astronomy and Astrophysics.
Replaced to match published version. Code can be downloaded at
https://github.com/elsner/arkco
A GPU-Computing Approach to Solar Stokes Profile Inversion
We present a new computational approach to the inversion of solar
photospheric Stokes polarization profiles, under the Milne-Eddington model, for
vector magnetography. Our code, named GENESIS (GENEtic Stokes Inversion
Strategy), employs multi-threaded parallel-processing techniques to harness the
computing power of graphics processing units GPUs, along with algorithms
designed to exploit the inherent parallelism of the Stokes inversion problem.
Using a genetic algorithm (GA) engineered specifically for use with a GPU, we
produce full-disc maps of the photospheric vector magnetic field from polarized
spectral line observations recorded by the Synoptic Optical Long-term
Investigations of the Sun (SOLIS) Vector Spectromagnetograph (VSM) instrument.
We show the advantages of pairing a population-parallel genetic algorithm with
data-parallel GPU-computing techniques, and present an overview of the Stokes
inversion problem, including a description of our adaptation to the
GPU-computing paradigm. Full-disc vector magnetograms derived by this method
are shown, using SOLIS/VSM data observed on 2008 March 28 at 15:45 UT
- …