Search CORE

5,436 research outputs found

Dependability Analysis of Control Systems using SystemC and Statistical Model Checking

Author: Legay Axel
Ngo Van Chan
Publication venue
Publication date: 28/07/2015
Field of study

Stochastic Petri nets are commonly used for modeling distributed systems in order to study their performance and dependability. This paper proposes a realization of stochastic Petri nets in SystemC for modeling large embedded control systems. Then statistical model checking is used to analyze the dependability of the constructed model. Our verification framework allows users to express a wide range of useful properties to be verified which is illustrated through a case study

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

PULP-HD: Accelerating Brain-Inspired High-Dimensional Computing on a Parallel Ultra-Low Power Platform

Author: Benatti Simone
Benini Luca
Montagna Fabio
Rahimi Abbas
Rossi Davide
Publication venue
Publication date: 01/01/2018
Field of study

Computing with high-dimensional (HD) vectors, also referred to as

\textit{hypervectors}

, is a brain-inspired alternative to computing with scalars. Key properties of HD computing include a well-defined set of arithmetic operations on hypervectors, generality, scalability, robustness, fast learning, and ubiquitous parallel operations. HD computing is about manipulating and comparing large patterns-binary hypervectors with 10,000 dimensions-making its efficient realization on minimalistic ultra-low-power platforms challenging. This paper describes HD computing's acceleration and its optimization of memory accesses and operations on a silicon prototype of the PULPv3 4-core platform (1.5mm

^2

, 2mW), surpassing the state-of-the-art classification accuracy (on average 92.4%) with simultaneous 3.7

\times

end-to-end speed-up and 2

\times

energy saving compared to its single-core execution. We further explore the scalability of our accelerator by increasing the number of inputs and classification window on a new generation of the PULP architecture featuring bit-manipulation instruction extensions and larger number of 8 cores. These together enable a near ideal speed-up of 18.4

\times

compared to the single-core PULPv3

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Cost-effective HPC clustering for computer vision applications

Author: Begley Seán
Dietlmeier Julia
Whelan Paul F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

We will present a cost-effective and flexible realization of high performance computing (HPC) clustering and its potential in solving computationally intensive problems in computer vision. The featured software foundation to support the parallel programming is the GNU parallel Knoppix package with message passing interface (MPI) based Octave, Python and C interface capabilities. The implementation is especially of interest in applications where the main objective is to reuse the existing hardware infrastructure and to maintain the overall budget cost. We will present the benchmark results and compare and contrast the performances of Octave and MATLAB

Crossref

Irish Universities

DCU Online Research Access Service

Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM

Author: Alaejos Guillermo
Alonso-Jordá Pedro
Castelló Adrián
Igual Francisco D.
Martínez Héctor
Quintana-Ortí Enrique S.
Publication venue
Publication date: 31/10/2023
Field of study

We explore the utilization of the Apache TVM open source framework to automatically generate a family of algorithms that follow the approach taken by popular linear algebra libraries, such as GotoBLAS2, BLIS and OpenBLAS, in order to obtain high-performance blocked formulations of the general matrix multiplication (GEMM). % In addition, we fully automatize the generation process, by also leveraging the Apache TVM framework to derive a complete variety of the processor-specific micro-kernels for GEMM. This is in contrast with the convention in high performance libraries, which hand-encode a single micro-kernel per architecture using Assembly code. % In global, the combination of our TVM-generated blocked algorithms and micro-kernels for GEMM 1)~improves portability, maintainability and, globally, streamlines the software life cycle; 2)~provides high flexibility to easily tailor and optimize the solution to different data types, processor architectures, and matrix operand shapes, yielding performance on a par (or even superior for specific matrix shapes) with that of hand-tuned libraries; and 3)~features a small memory footprint.Comment: 35 pages, 22 figures. Submitted to ACM TOM

arXiv.org e-Print Archive

A High-Performance Implementation of Atomistic Spin Dynamics Simulations on x86 CPUs

Author: Chen Hongwei
Feiguin Adrian
Turner Joshua J.
Zhai Yujia
Publication venue
Publication date: 21/04/2023
Field of study

Atomistic spin dynamics simulations provide valuable information about the energy spectrum of magnetic materials in different phases, allowing one to identify instabilities and the nature of their excitations. However, the time cost of evaluating the dynamical correlation function

S(\mathbf{q}, t)

increases quadratically as the number of spins

N

, leading to significant computational effort, making the simulation of large spin systems very challenging. In this work, we propose to use a highly optimized general matrix multiply (GEMM) subroutine to calculate the dynamical spin-spin correlation function that can achieve near-optimal hardware utilization. Furthermore, we fuse the element-wise operations in the calculation of

S(\mathbf{q}, t)

into the in-house GEMM kernel, which results in further performance improvements of 44\% - 71\% on several relatively large lattice sizes when compared to the implementation that uses the GEMM subroutine in OpenBLAS, which is the state-of-the-art open source library for Basic Linear Algebra Subroutine (BLAS).Comment: 18 (short) pages, 6 figure

arXiv.org e-Print Archive

ARKCoS: Artifact-Suppressed Accelerated Radial Kernel Convolution on the Sphere

Author: Elsner Franz
Wandelt Benjamin D.
Publication venue: 'EDP Sciences'
Publication date: 01/01/2011
Field of study

We describe a hybrid Fourier/direct space convolution algorithm for compact radial (azimuthally symmetric) kernels on the sphere. For high resolution maps covering a large fraction of the sky, our implementation takes advantage of the inexpensive massive parallelism afforded by consumer graphics processing units (GPUs). Applications involve modeling of instrumental beam shapes in terms of compact kernels, computation of fine-scale wavelet transformations, and optimal filtering for the detection of point sources. Our algorithm works for any pixelization where pixels are grouped into isolatitude rings. Even for kernels that are not bandwidth limited, ringing features are completely absent on an ECP grid. We demonstrate that they can be highly suppressed on the popular HEALPix pixelization, for which we develop a freely available implementation of the algorithm. As an example application, we show that running on a high-end consumer graphics card our method speeds up beam convolution for simulations of a characteristic Planck high frequency instrument channel by two orders of magnitude compared to the commonly used HEALPix implementation on one CPU core while maintaining at typical a fractional RMS accuracy of about 1 part in 10^5.Comment: 10 pages, 6 figures. Submitted to Astronomy and Astrophysics. Replaced to match published version. Code can be downloaded at https://github.com/elsner/arkco

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

HAL-INSU

A GPU-Computing Approach to Solar Stokes Profile Inversion

Author: Brian J. Harker
Darwin
Davis
Eydenberg
Gray
Harker
Hestroffer
Karr
Kenneth J. Mighell
Landi Degl'Innocenti
Levenberg
NVIDIA Corp.
NVIDIA Corp.
Pierce
Press
Rachkovsky
Rees
Rees
Sanchez Almeida
Socas-Navarro
Unno
Publication venue: 'IOP Publishing'
Publication date: 01/08/2012
Field of study

We present a new computational approach to the inversion of solar photospheric Stokes polarization profiles, under the Milne-Eddington model, for vector magnetography. Our code, named GENESIS (GENEtic Stokes Inversion Strategy), employs multi-threaded parallel-processing techniques to harness the computing power of graphics processing units GPUs, along with algorithms designed to exploit the inherent parallelism of the Stokes inversion problem. Using a genetic algorithm (GA) engineered specifically for use with a GPU, we produce full-disc maps of the photospheric vector magnetic field from polarized spectral line observations recorded by the Synoptic Optical Long-term Investigations of the Sun (SOLIS) Vector Spectromagnetograph (VSM) instrument. We show the advantages of pairing a population-parallel genetic algorithm with data-parallel GPU-computing techniques, and present an overview of the Stokes inversion problem, including a description of our adaptation to the GPU-computing paradigm. Full-disc vector magnetograms derived by this method are shown, using SOLIS/VSM data observed on 2008 March 28 at 15:45 UT

arXiv.org e-Print Archive

Crossref