Search CORE

10 research outputs found

Dynamic Analyses of Result Quality in Energy-Aware Approximate Programs

Author: Michael F. Ringenburg
Michael F. Ringenburg
Michael F. Ringenburg
Publication venue
Publication date: 01/01/2014
Field of study

Thesis (Ph.D.)--University of Washington, 2014Energy efficiency is a key concern in the design of modern computer systems. One promising approach to energy-efficient computation, approximate computing, trades off output precision for energy efficiency. However, this tradeoff can have unexpected effects on computation quality. This thesis presents dynamic analysis tools to study, debug, and monitor the quality and energy efficiency of approximate computations. We propose three styles of tools: prototyping tools that allow developers to experiment with approximation in their applications, offline tools that instrument code to determine the key sources of error, and online tools that monitor the quality of deployed applications in real time. Our prototyping tool is based on an extension to the functional language OCaml. We add approximation constructs to the language, an approximation simulator to the runtime, and profiling and auto-tuning tools for studying and experimenting with energy-quality tradeoffs. We also present two offline debugging tools and three online monitoring tools. The first offline tool identifies correlations between output quality and the total number of executions of, and errors in, individual approximate operations. The second tracks the number of approximate operations that flow into a particular value. Our online tools comprise three low-cost approaches to dynamic quality monitoring. They are designed to monitor quality in deployed applications without spending more energy than is saved by approximation. Online monitors can be used to perform real time adjustments to energy usage in order to meet specific quality goals. We present prototype implementations of all of these tools and describe their usage with several applications. Our prototyping, profiling, and autotuning tools allow us to experiment with approximation strategies and identify new strategies, our offline tools succeed in providing new insights into the effects of approximation on output quality, and our monitors succeed in controlling output quality while still maintaining significant energy efficiency gains

CiteSeerX

DSpace at The University of Washington

Applying the Vector Radix Method to Multidimensional, Multiprocessor, Out-of-Core Fast Fourier Transforms

Author: Ringenburg Michael F
Publication venue: Dartmouth Digital Commons
Publication date: 01/03/2001
Field of study

We describe an efficient algorithm for calculating Fast Fourier Transforms on matrices of arbitrarily high dimension using the vector-radix method when the problem size is out-of-core (i.e., when the size of the data set is larger than the total available memory of the system). The algorithm takes advantage of multiple processors when they are present, but it is also efficient on single-processor systems. Our work is an extension of work done by Lauren Baptist in [Bapt99], which applied the vector-radix method to 2-dimensional out-of-core matrices. To determine the effectiveness of the algorithm, we present empirical results as well as an analysis of the I/O, communication, and computational complexity. We perform the empirical tests on a DEC 2100 server and on a cluster of Pentium-based Linux workstations. We compare our results with the traditional dimensional method of calculating multidimensional FFTs, and show that as the number of dimensions increases, the vector-radix-based algorithm becomes increasingly effective relative to the dimensional method. In order to calculate the complexity of the algorithm, it was necessary to develop a method for analyzing the interprocessor communication costs of the BMMC data-permutation algorithm (presented in [CSW98]) used by our FFT algorithms. We present this analysis method and show how it was derived

Dartmouth Digital Commons (Dartmouth College)

Preventing Format-String Attacks via Automatic and Efficient Dynamic Checking

Author: Dan Grossman
Michael F. Ringenburg
Publication venue
Publication date: 01/01/2005
Field of study

We propose preventing format-string attacks with a combination of static dataflow analysis and dynamic white-lists of safe address ranges. The dynamic nature of our white-lists provides the flexibility necessary to encode a very precise security policy—namely, that %n-specifiers in printf-style functions should modify a memory location x only if the programmer explicitly passes a pointer to x. Our static dataflow analysis and source transformations let us automatically maintain and check the white-list without any programmer effort—they merely need to change the Makefile. Our analysis also detects pointers passed to vprintfstyle functions through (possibly multiple layers of) wrapper functions. Our results establish that our approach provides better protection than previous work and incurs little performance overhead

CiteSeerX

Type Safety and Erasure Proofs for “A Type System for Coordinated Data Structures”

Author: Dan Grossman
Michael F. Ringenburg
Publication venue
Publication date
Field of study

We prove the Type Safety and Erasure Theorems presented in Section 4 of Ringenburg and Grossman’s paper “A Type System for Coordinated Data Structures ” [1]. We also remind the reader of the syntax, semantics, and typing rules for the coordinated list language described in Section 3 of the same paper. We refer the reader to the original paper for a detailed presentation of the coordinated data structure type system. 1 The Language Figures 1, 2, and 3 present, respectively, the syntax, semantics, and typing rules for our coordinated list language. We implicitly assume ∆ and Γ do not have repeated elements. For example, ∆, α:κ is ill-formed if α ∈ Dom(∆). To avoid conflicts, we can systematically rename constructs with binding occurrences. We therefore treat ∆ and Γ as partial functions. All explicit occurrences of α and x in the grammar are binding (except when they constitute the entire type or expression, of course). Substitution is defined as usual

CiteSeerX

Monitoring and Debugging the Quality of Results in Approximate Programs

Author: Carbin Michael
Chakrapani Lakshmi N.
Grigorian Beayna
Leem Larkhoon
Narayanan Sriram
Ringenburg Michael F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

AtomCaml

Author: Dan Grossman
Harris Tim
Manson Jeremy
Michael F. Ringenburg
Wing Jeannette M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Recommended from our members

A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark:

Author: Bowen Benjamin
Chhugani Jatin
Fischer Curt
Gittens Alex
Kottalam Jey
Krishnamurthy Venkat
Lewis Norman, G.
Mahoney Michael, W.
Prabhat Mr
Racah Evan
Ringenburg Michael, F.
Ruebel Oliver
Singh Mohitdeep
Yang Jiyan
Yao Yushu
Publication venue: eScholarship, University of California
Publication date: 23/05/2016
Field of study

We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2 cluster, a Cray XC40 system, and an experimental Cray cluster. We implemented this factorization both as a parallelized C implementation with hand-tuned optimizations and in Scala using the Apache Spark high-level cluster computing framework. We obtained consistent performance across the three platforms: using Spark we were able to process the 1TB size dataset in under 30 minutes with 960 cores on all systems, with the fastest times obtained on the experimental Cray cluster. In comparison, the C implementation was 21X faster on the Amazon EC2 system, due to careful cache optimizations, bandwidth-friendly access of matrices and vector computation using SIMD units. We report these results and their implications on the hardware and software issues arising in supporting data-centric workloads in parallel and distributed environments

eScholarship - University of California

CosmoFlow: Using deep learning to learn the universe at scale

Author: Arnemann James
Bard Deborah
He Siyu
Ho Shirley
Karna Tuomas
Kumar Nalini
Lee Victor
Maschhoff Kristyn
Mathuriya Amrita
Meadows Lawrence
Mendygral Peter
Moise Diana
Pennycook Simon J
Prabhat
Press IEEE
Ringenburg Michael F
Sewall Jason
Shao Lei
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading for many element-wise operations, to improve training performance on Intel® Xeon Phi™ processors. We also utilize the Cray PE Machine Learning Plugin for efficient scaling to multiple nodes. We demonstrate fully synchronous data-parallel training on 8192 nodes of Cori with 77% parallel efficiency, achieving 3.5 Pflop/s sustained performance. To our knowledge, this is the first large-scale science application of the TensorFlow framework at supercomputer scale with fully-synchronous training. These enhancements enable us to process large 3D dark matter distribution and predict the cosmological parameters ΩsubM/sub, σsub8/sub and nsubs/sub with unprecedented accuracy

arXiv.org e-Print Archive

eScholarship - University of California