Search CORE

3,892 research outputs found

Affine Vector Cache for memory bandwidth savings

Author: Collange Caroline
Kouyoumdjian Alexandre
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

Preserving memory locality is a major issue in highly-multithreaded architectures such as GPUs. These architectures hide latency by maintaining a large number of threads in flight. As each thread needs to maintain a private working set, all threads collectively put tremendous pressure on on-chip memory arrays, at significant cost in area and power. We show that thread-private data in GPU-like implicit SIMD architectures can be compressed by a factor up to 16 by taking advantage of correlations between values held by different threads. We propose the Affine Vector Cache, a compressed cache design that complements the first level cache. Evaluation by simulation on the SDK and Rodinia benchmarks shows that a 32KB L1 cache assisted by a 16KB AVC presents a 59% larger usable capacity on average compared to a single 48KB L1 cache. It results in a global performance increase of 5.7% along with an energy reduction of 11% for a negligible hardware cost

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

GPU-Based Data Processing for 2-D Microwave Imaging on MAST

Author: BITTNER R.
CASTRO R.
DAVIS W. M.
EIDIETIS N. W.
FREETHY S. J.
FREETHY S. J.
GARELLI N.
HUANG B. K.
LUJAN P.
MONTEIRO E.
NAVARRO C. A.
NAYLOR G. A.
NICKOLLS J.
OWENS J. D.
PELL O.
SALMON N. A.
SHEVCHENKO V. F.
SHEVCHENKO V. F.
THOMAS D. A.
THOUTI K.
URBAN J.
VAN CITTERT P. H.
VERMIJ E.
WYNTERS E.
XU C.
YANG L.
YUE X.
ZERNIKE F.
Publication venue: 'American Nuclear Society'
Publication date: 01/05/2016
Field of study

The Synthetic Aperture Microwave Imaging (SAMI) diagnostic is a Mega Amp Spherical Tokamak (MAST) diagnostic based at Culham Centre for Fusion Energy. The acceleration of the SAMI diagnostic data-processing code by a graphics processing unit is presented, demonstrating acceleration of up to 60 times compared to the original IDL (Interactive Data Language) data-processing code. SAMI will now be capable of intershot processing allowing pseudo-real-time control so that adjustments and optimizations can be made between shots. Additionally, for the first time the analysis of many shots will be possible

Crossref

White Rose Research Online

NASA JSC neural network survey results

Author: Greenwood Dan
Publication venue
Publication date
Field of study

A survey of Artificial Neural Systems in support of NASA's (Johnson Space Center) Automatic Perception for Mission Planning and Flight Control Research Program was conducted. Several of the world's leading researchers contributed papers containing their most recent results on artificial neural systems. These papers were broken into categories and descriptive accounts of the results make up a large part of this report. Also included is material on sources of information on artificial neural systems such as books, technical reports, software tools, etc

NASA Technical Reports Server

Parallel mutual information estimation for inferring gene regulatory networks on GPUs

Author: AJ Butte
AM Fraser
Bertil Schmidt
CO Daub
E Lindholm
Haixiang Shi
I Arsic
J Schäfer
J Wilson
J Zola
J Zola
JPW Pluim
M Tebmann
N CUDA
N Friedman
P D'Haeseleer
SA Manavski
W Liu
Weiguo Liu
Wolfgang Müller-Wittig
X Chen
X Zhou
X Zhou
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Mutual information is a measure of similarity between two variables. It has been widely used in various application domains including computational biology, machine learning, statistics, image processing, and financial computing. Previously used simple histogram based mutual information estimators lack the precision in quality compared to kernel based methods. The recently introduced B-spline function based mutual information estimation method is competitive to the kernel based methods in terms of quality but at a lower computational complexity. Results We present a new approach to accelerate the B-spline function based mutual information estimation algorithm with commodity graphics hardware. To derive an efficient mapping onto this type of architecture, we have used the Compute Unified Device Architecture (CUDA) programming model to design and implement a new parallel algorithm. Our implementation, called CUDA-MI, can achieve speedups of up to 82 using double precision on a single GPU compared to a multi-threaded implementation on a quad-core CPU for large microarray datasets. We have used the results obtained by CUDA-MI to infer gene regulatory networks (GRNs) from microarray data. The comparisons to existing methods including ARACNE and TINGe show that CUDA-MI produces GRNs of higher quality in less time. Conclusions CUDA-MI is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant speedup over sequential multi-threaded implementation by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Comprehensive Workflow for General-Purpose Neural Modeling with Highly Configurable Neuromorphic Hardware Systems

Author: A Burkitt
A Destexhe
A Destexhe
A Kuhn
A Kumar
A Kumar
A Morrison
A Morrison
A Morrison
AP Davison
B Connors
CA Mead
CA Mead
CA Mead
D Buxhoeveden
G Indiveri
G Indiveri
G Indiveri
H Markram
HP Langtangen
J Costas-Santos
J Kremkow
J Kremkow
J Kremkow
JD Hunter
JM Eppler
L Tao
M Diesmann
M Lundqvist
M Norris
M Pfeiffer
M Pospischil
M Steriade
MCW Rossum van
ML Hines
MO Gewaltig
N Brunel
P Häfliger
PJ Sjöström
R Brette
R Brette
R Gütig
R Horak
R Naud
R Serrano-Gotarredona
RJ Douglas
RJ Vogelstein
RS Zucker
S El Boustani
S Mitra
T Binzegger
T Delbrück
TE Oliphant
TP Vogels
V Dante
VB Mountcastle
Y Aviel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

In this paper we present a methodological framework that meets novel requirements emerging from upcoming types of accelerated and highly configurable neuromorphic hardware systems. We describe in detail a device with 45 million programmable and dynamic synapses that is currently under development, and we sketch the conceptual challenges that arise from taking this platform into operation. More specifically, we aim at the establishment of this neuromorphic system as a flexible and neuroscientifically valuable modeling tool that can be used by non-hardware-experts. We consider various functional aspects to be crucial for this purpose, and we introduce a consistent workflow with detailed descriptions of all involved modules that implement the suggested steps: The integration of the hardware interface into the simulator-independent model description language PyNN; a fully automated translation between the PyNN domain and appropriate hardware configurations; an executable specification of the future neuromorphic system that can be seamlessly integrated into this biology-to-hardware mapping process as a test bench for all software layers and possible hardware design modifications; an evaluation scheme that deploys models from a dedicated benchmark library, compares the results generated by virtual or prototype hardware devices with reference software simulations and analyzes the differences. The integration of these components into one hardware-software workflow provides an ecosystem for ongoing preparative studies that support the hardware design process and represents the basis for the maturity of the model-to-hardware mapping software. The functionality and flexibility of the latter is proven with a variety of experimental results

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Publikationsserver der RWTH Aachen University

Juelich Shared Electronic Resources

MIRAGE : a system for distributed image generation on workstation clusters

Author: Weber Darrin
Publication venue: Lehigh Preserve
Publication date
Field of study

Lehigh University: Lehigh Preserve

Improving GPU Shared Memory Access Efficiency

Author: Gao Shuang
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2014
Field of study

Graphic Processing Units (GPUs) often employ shared memory to provide efficient storage for threads within a computational block. This shared memory includes multiple banks to improve performance by enabling concurrent accesses across the memory banks. Conflicts occur when multiple memory accesses attempt to simultaneously access a particular bank, resulting in serialized access and concomitant performance reduction. Identifying and eliminating these memory bank access conflicts becomes critical for achieving high performance on GPUs; however, for common 1D and 2D access patterns, understanding the potential bank conflicts can prove difficult. Current GPUs support memory bank accesses with configurable bit-widths; optimizing these bitwidths could result in data layouts with fewer conflicts and better performance. This dissertation presents a framework for bank conflict analysis and automatic optimization. Given static access pattern information for a kernel, this tool analyzes the conflict number of each pattern, and then searches for an optimized solution for all shared memory buffers. This data layout solution is based on parameters for inter-padding, intrapadding, and the bank access bit-width. The experimental results show that static bank conflict analysis is a practical solution and independent of the workload size of a given access pattern. For 13 kernels from 6 benchmarks suites (RODINIA and NVIDIA CUDA SDK) facing shared memory bank conflicts, tests indicated this approach can gain 5%- 35% improvement in runtime

University of Tennessee, Knoxville: Trace

PERICLES Deliverable 4.3:Content Semantics and Use Context Analysis Techniques

Author: Chatzilari E
Corubolo F
Darányi Sandor
De Weerdt David
Gill Alastair
Kontopoulos Efstratios
Maronidis A
Mitzias P
Nikopoulos S
Riga M
Sauter Christine
Tonkin Emma L.
Waddington Simon
Wittek Peter
Publication venue
Publication date: 01/01/2016
Field of study

The current deliverable summarises the work conducted within task T4.3 of WP4, focusing on the extraction and the subsequent analysis of semantic information from digital content, which is imperative for its preservability. More specifically, the deliverable defines content semantic information from a visual and textual perspective, explains how this information can be exploited in long-term digital preservation and proposes novel approaches for extracting this information in a scalable manner. Additionally, the deliverable discusses novel techniques for retrieving and analysing the context of use of digital objects. Although this topic has not been extensively studied by existing literature, we believe use context is vital in augmenting the semantic information and maintaining the usability and preservability of the digital objects, as well as their ability to be accurately interpreted as initially intended.PERICLE

University of Borås

Digitala Vetenskapliga Arkivet - Academic Archive On-line

King's Research Portal

Explore Bristol Research

ICASE/LaRC Symposium on Visualizing Time-Varying Data

Author: Banks D. C.
Crockett T. W.
Stacy K.
Publication venue
Publication date
Field of study

Time-varying datasets present difficult problems for both analysis and visualization. For example, the data may be terabytes in size, distributed across mass storage systems at several sites, with time scales ranging from femtoseconds to eons. In response to these challenges, ICASE and NASA Langley Research Center, in cooperation with ACM SIGGRAPH, organized the first symposium on visualizing time-varying data. The purpose was to bring the producers of time-varying data together with visualization specialists to assess open issues in the field, present new solutions, and encourage collaborative problem-solving. These proceedings contain the peer-reviewed papers which were presented at the symposium. They cover a broad range of topics, from methods for modeling and compressing data to systems for visualizing CFD simulations and World Wide Web traffic. Because the subject matter is inherently dynamic, a paper proceedings cannot adequately convey all aspects of the work. The accompanying video proceedings provide additional context for several of the papers

NASA Technical Reports Server

The Hipeac Vision, 2010

Author: Cohen Albert
De Bosschere Koen
De Sutter Bjorn
Duranton Marc
Falsafi Babak
Gaydadjiev Georgi
Katevenis Manolis
Maebe Jonas
Munk Harm
Navarro Nacho
Ramirez Alex
Temam Olivier
Valero Matero
Yehia Sami
Publication venue: HiPEAC
Publication date: 01/01/2010
Field of study

Ghent University Academic Bibliography

Archivsystem Ask23