Search CORE

784 research outputs found

Programming models, compilers, and runtime systems for accelerator computing

Author: Sabne Amit
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2016
Field of study

Accelerators, such as GPUs and Intel Xeon Phis, have become the workhorses of high-performance computing. Typically, the accelerators act as co-processors, with discrete memory spaces. They possess massive parallelism, along with many other unique architectural features. In order to obtain high performance, these features must be carefully exploited, which requires high programmer expertise. This thesis presents new programming models, and the necessary compiler and runtime systems to ease the accelerator programming process, while obtaining high performance

Purdue E-Pubs

A distributed Quadtree Dictionary approach to multi-resolution visualization of scattered neutron data

Author: Dooley Rion
Publication venue: LSU Digital Commons
Publication date: 01/01/2004
Field of study

Grid computing is described as dependable, seamless, pervasive access to resources and services, whereas mobile computing allows the movement of people from place to place while staying connected to resources at each location. Mobile grid computing is a new computing paradigm, which joins these two technologies by enabling access to the collection of resources within a user\u27s virtual organization while still maintaining the freedom of mobile computing through a service paradigm. A major problem in virtual organization is needs mismatch, in which one resources requests a service from another resources it is unable to fulfill, since virtual organizations are necessarily heterogeneous collections of resources. In this dissertation we propose a solution to the needs mismatch problem in the case of high energy physics data. Specifically, we propose a Quadtree Dictionary (QTD) algorithm to provide lossless, multi-resolution compression of datasets and enable their visualization on devices of all capabilities. As a prototype application, we extend the Integrated Spectral Analysis Workbench (ISAW) developed at the Intense Pulsed Neutron Source Division of the Argonne National Laboratory into a mobile Grid application, Mobile ISAW. In this dissertation we compare our QTD algorithm with several existing compression techniques on ISAW\u27s Single-Crystal Diffractometer (SCD) datasets. We then extend our QTD algorithm to a distributed setting and examine its effectiveness on the next generation of SCD datasets. In both a serial and distributed setting, our QTD algorithm performs no worse than existing techniques such as the square wavelet transform in terms of energy conservation, while providing the worst-case savings of 8:1

Louisiana State University

Linux kernel support for Micro-Heterogeneous Computing

Author: Schuttenberg Kim R.
Publication venue: RIT Scholar Works
Publication date: 01/01/2004
Field of study

Heterogeneous Computing (HC) is a technique speeds the computation of large tasks by utilizing multiple computers or supercomputers, each of which is best suited to a particular type of computation. Micro-Heterogeneous Computing (MHC) has been proposed to bring this practice to individual computers. With the aid of the higher speed, short distance interconnects found within the next generation of personal computers and commodity servers, it should be possible to apply HC to relatively small grain sizes. MHC draws on the observations that in order for a new computing technology to become widely accepted, and cost effective, there must be a suitable abstraction layer that frees the application writers from the need of precise technical knowledge about the system. MHC provides such an abstraction layer, referred to as the MHC framework, which provides automated solutions to many of the problems that must be overcome when utilizing HC. The framework was designed with the goals of user transparency, flexibility, and performance. The problems addressed include matching tasks to devices and scheduling them (collectively known as mapping), dependency analysis, and parallelization of serial code. All of these problems are solved dynamically at run time by the framework whose implementation is discussed herein. In support of this framework, this thesis specifies a format for libraries that provide common functions and free their users from the tasks of code profiling and analytical benchmarking. This thesis provides the first implementation for such an abstraction layer by utilizing the Linux operating system. This thesis provides not only the kernel level support necessary to schedule tasks to hardware, but also implements the entire core framework, with functioning solutions to the problems mentioned above. This thesis provides well-defined interfaces and methods to expand the MHC system with new scheduling heuristics, function libraries, and device drivers

RIT Scholar Works

libcloudph++ 0.2: single-moment bulk, double-moment bulk, and particle-based warm-rain microphysics library in C++

Author: Arabas Sylwester
Grabowski Wojciech W.
Jaruga Anna
Pawlowska Hanna
Publication venue: 'Copernicus GmbH'
Publication date: 18/08/2014
Field of study

This paper introduces a library of algorithms for representing cloud microphysics in numerical models. The library is written in C++, hence the name libcloudph++. In the current release, the library covers three warm-rain schemes: the single- and double-moment bulk schemes, and the particle-based scheme with Monte-Carlo coalescence. The three schemes are intended for modelling frameworks of different dimensionality and complexity ranging from parcel models to multi-dimensional cloud-resolving (e.g. large-eddy) simulations. A two-dimensional prescribed-flow framework is used in example simulations presented in the paper with the aim of highlighting the library features. The libcloudph++ and all its mandatory dependencies are free and open-source software. The Boost.units library is used for zero-overhead dimensional analysis of the code at compile time. The particle-based scheme is implemented using the Thrust library that allows to leverage the power of graphics processing units (GPU), retaining the possibility to compile the unchanged code for execution on single or multiple standard processors (CPUs). The paper includes complete description of the programming interface (API) of the library and a performance analysis including comparison of GPU and CPU setups.Comment: The library description has been updated to the new library API (i.e. v0.1 -> v0.2 update). The key difference is that the model state variables are now mixing ratios as opposed to densities. The particle-based scheme was supplemented with the "particle recycling" process. Numerous editorial corrections were mad

arXiv.org e-Print Archive

Directory of Open Access Journals

Performance Optimization Strategies for Transactional Memory Applications

Author: Schindewolf Martin Otto
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2013
Field of study

This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack. Therefore, this thesis addresses a number of challenges to extract static information, information about the run time behavior, and expert-level knowledge to develop these new methods and strategies for the optimization of TM applications

KITopen

Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays

Author: Chakrabarti K.
Daniel Lemire
Geffner S.
Gray J.
Lemire D.
Li B.-C.
Moerkotte G.
Owen Kaser
Schmidt R. R.
Scott D.
Vitter J. S.
Zhou F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2008
Field of study

Local moments are used for local regression, to compute statistical measures such as sums, averages, and standard deviations, and to approximate probability distributions. We consider the case where the data source is a very large I/O array of size n and we want to compute the first N local moments, for some constant N. Without precomputation, this requires O(n) time. We develop a sequence of algorithms of increasing sophistication that use precomputation and additional buffer space to speed up queries. The simpler algorithms partition the I/O array into consecutive ranges called bins, and they are applicable not only to local-moment queries, but also to algebraic queries (MAX, AVERAGE, SUM, etc.). With N buffers of size sqrt{n}, time complexity drops to O(sqrt n). A more sophisticated approach uses hierarchical buffering and has a logarithmic time complexity (O(b log_b n)), when using N hierarchical buffers of size n/b. Using Overlapped Bin Buffering, we show that only a single buffer is needed, as with wavelet-based algorithms, but using much less storage. Applications exist in multidimensional and statistical databases over massive data sets, interactive image processing, and visualization

arXiv.org e-Print Archive

R-libre

Crossref