784 research outputs found
Programming models, compilers, and runtime systems for accelerator computing
Accelerators, such as GPUs and Intel Xeon Phis, have become the workhorses of high-performance computing. Typically, the accelerators act as co-processors, with discrete memory spaces. They possess massive parallelism, along with many other unique architectural features. In order to obtain high performance, these features must be carefully exploited, which requires high programmer expertise. This thesis presents new programming models, and the necessary compiler and runtime systems to ease the accelerator programming process, while obtaining high performance
A distributed Quadtree Dictionary approach to multi-resolution visualization of scattered neutron data
Grid computing is described as dependable, seamless, pervasive access to resources and services, whereas mobile computing allows the movement of people from place to place while staying connected to resources at each location. Mobile grid computing is a new computing paradigm, which joins these two technologies by enabling access to the collection of resources within a user\u27s virtual organization while still maintaining the freedom of mobile computing through a service paradigm. A major problem in virtual organization is needs mismatch, in which one resources requests a service from another resources it is unable to fulfill, since virtual organizations are necessarily heterogeneous collections of resources. In this dissertation we propose a solution to the needs mismatch problem in the case of high energy physics data. Specifically, we propose a Quadtree Dictionary (QTD) algorithm to provide lossless, multi-resolution compression of datasets and enable their visualization on devices of all capabilities. As a prototype application, we extend the Integrated Spectral Analysis Workbench (ISAW) developed at the Intense Pulsed Neutron Source Division of the Argonne National Laboratory into a mobile Grid application, Mobile ISAW. In this dissertation we compare our QTD algorithm with several existing compression techniques on ISAW\u27s Single-Crystal Diffractometer (SCD) datasets. We then extend our QTD algorithm to a distributed setting and examine its effectiveness on the next generation of SCD datasets. In both a serial and distributed setting, our QTD algorithm performs no worse than existing techniques such as the square wavelet transform in terms of energy conservation, while providing the worst-case savings of 8:1
Linux kernel support for Micro-Heterogeneous Computing
Heterogeneous Computing (HC) is a technique speeds the computation of large tasks by utilizing multiple computers or supercomputers, each of which is best suited to a particular type of computation. Micro-Heterogeneous Computing (MHC) has been proposed to bring this practice to individual computers. With the aid of the higher speed, short distance interconnects found within the next generation of personal computers and commodity servers, it should be possible to apply HC to relatively small grain sizes. MHC draws on the observations that in order for a new computing technology to become widely accepted, and cost effective, there must be a suitable abstraction layer that frees the application writers from the need of precise technical knowledge about the system. MHC provides such an abstraction layer, referred to as the MHC framework, which provides automated solutions to many of the problems that must be overcome when utilizing HC. The framework was designed with the goals of user transparency, flexibility, and performance. The problems addressed include matching tasks to devices and scheduling them (collectively known as mapping), dependency analysis, and parallelization of serial code. All of these problems are solved dynamically at run time by the framework whose implementation is discussed herein. In support of this framework, this thesis specifies a format for libraries that provide common functions and free their users from the tasks of code profiling and analytical benchmarking. This thesis provides the first implementation for such an abstraction layer by utilizing the Linux operating system. This thesis provides not only the kernel level support necessary to schedule tasks to hardware, but also implements the entire core framework, with functioning solutions to the problems mentioned above. This thesis provides well-defined interfaces and methods to expand the MHC system with new scheduling heuristics, function libraries, and device drivers
libcloudph++ 0.2: single-moment bulk, double-moment bulk, and particle-based warm-rain microphysics library in C++
This paper introduces a library of algorithms for representing cloud
microphysics in numerical models. The library is written in C++, hence the name
libcloudph++. In the current release, the library covers three warm-rain
schemes: the single- and double-moment bulk schemes, and the particle-based
scheme with Monte-Carlo coalescence. The three schemes are intended for
modelling frameworks of different dimensionality and complexity ranging from
parcel models to multi-dimensional cloud-resolving (e.g. large-eddy)
simulations. A two-dimensional prescribed-flow framework is used in example
simulations presented in the paper with the aim of highlighting the library
features. The libcloudph++ and all its mandatory dependencies are free and
open-source software. The Boost.units library is used for zero-overhead
dimensional analysis of the code at compile time. The particle-based scheme is
implemented using the Thrust library that allows to leverage the power of
graphics processing units (GPU), retaining the possibility to compile the
unchanged code for execution on single or multiple standard processors (CPUs).
The paper includes complete description of the programming interface (API) of
the library and a performance analysis including comparison of GPU and CPU
setups.Comment: The library description has been updated to the new library API (i.e.
v0.1 -> v0.2 update). The key difference is that the model state variables
are now mixing ratios as opposed to densities. The particle-based scheme was
supplemented with the "particle recycling" process. Numerous editorial
corrections were mad
Performance Optimization Strategies for Transactional Memory Applications
This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack. Therefore, this thesis addresses a number of challenges to extract static information, information about the run time behavior, and expert-level knowledge to develop these new methods and strategies for the optimization of TM applications
Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays
Local moments are used for local regression, to compute statistical measures
such as sums, averages, and standard deviations, and to approximate probability
distributions. We consider the case where the data source is a very large I/O
array of size n and we want to compute the first N local moments, for some
constant N. Without precomputation, this requires O(n) time. We develop a
sequence of algorithms of increasing sophistication that use precomputation and
additional buffer space to speed up queries. The simpler algorithms partition
the I/O array into consecutive ranges called bins, and they are applicable not
only to local-moment queries, but also to algebraic queries (MAX, AVERAGE, SUM,
etc.). With N buffers of size sqrt{n}, time complexity drops to O(sqrt n). A
more sophisticated approach uses hierarchical buffering and has a logarithmic
time complexity (O(b log_b n)), when using N hierarchical buffers of size n/b.
Using Overlapped Bin Buffering, we show that only a single buffer is needed, as
with wavelet-based algorithms, but using much less storage. Applications exist
in multidimensional and statistical databases over massive data sets,
interactive image processing, and visualization
- …