10 research outputs found

    Dynamic Analyses of Result Quality in Energy-Aware Approximate Programs

    No full text
    Thesis (Ph.D.)--University of Washington, 2014Energy efficiency is a key concern in the design of modern computer systems. One promising approach to energy-efficient computation, approximate computing, trades off output precision for energy efficiency. However, this tradeoff can have unexpected effects on computation quality. This thesis presents dynamic analysis tools to study, debug, and monitor the quality and energy efficiency of approximate computations. We propose three styles of tools: prototyping tools that allow developers to experiment with approximation in their applications, offline tools that instrument code to determine the key sources of error, and online tools that monitor the quality of deployed applications in real time. Our prototyping tool is based on an extension to the functional language OCaml. We add approximation constructs to the language, an approximation simulator to the runtime, and profiling and auto-tuning tools for studying and experimenting with energy-quality tradeoffs. We also present two offline debugging tools and three online monitoring tools. The first offline tool identifies correlations between output quality and the total number of executions of, and errors in, individual approximate operations. The second tracks the number of approximate operations that flow into a particular value. Our online tools comprise three low-cost approaches to dynamic quality monitoring. They are designed to monitor quality in deployed applications without spending more energy than is saved by approximation. Online monitors can be used to perform real time adjustments to energy usage in order to meet specific quality goals. We present prototype implementations of all of these tools and describe their usage with several applications. Our prototyping, profiling, and autotuning tools allow us to experiment with approximation strategies and identify new strategies, our offline tools succeed in providing new insights into the effects of approximation on output quality, and our monitors succeed in controlling output quality while still maintaining significant energy efficiency gains

    Applying the Vector Radix Method to Multidimensional, Multiprocessor, Out-of-Core Fast Fourier Transforms

    No full text
    We describe an efficient algorithm for calculating Fast Fourier Transforms on matrices of arbitrarily high dimension using the vector-radix method when the problem size is out-of-core (i.e., when the size of the data set is larger than the total available memory of the system). The algorithm takes advantage of multiple processors when they are present, but it is also efficient on single-processor systems. Our work is an extension of work done by Lauren Baptist in [Bapt99], which applied the vector-radix method to 2-dimensional out-of-core matrices. To determine the effectiveness of the algorithm, we present empirical results as well as an analysis of the I/O, communication, and computational complexity. We perform the empirical tests on a DEC 2100 server and on a cluster of Pentium-based Linux workstations. We compare our results with the traditional dimensional method of calculating multidimensional FFTs, and show that as the number of dimensions increases, the vector-radix-based algorithm becomes increasingly effective relative to the dimensional method. In order to calculate the complexity of the algorithm, it was necessary to develop a method for analyzing the interprocessor communication costs of the BMMC data-permutation algorithm (presented in [CSW98]) used by our FFT algorithms. We present this analysis method and show how it was derived

    Preventing Format-String Attacks via Automatic and Efficient Dynamic Checking

    No full text
    We propose preventing format-string attacks with a combination of static dataflow analysis and dynamic white-lists of safe address ranges. The dynamic nature of our white-lists provides the flexibility necessary to encode a very precise security policy—namely, that %n-specifiers in printf-style functions should modify a memory location x only if the programmer explicitly passes a pointer to x. Our static dataflow analysis and source transformations let us automatically maintain and check the white-list without any programmer effort—they merely need to change the Makefile. Our analysis also detects pointers passed to vprintfstyle functions through (possibly multiple layers of) wrapper functions. Our results establish that our approach provides better protection than previous work and incurs little performance overhead

    Type Safety and Erasure Proofs for “A Type System for Coordinated Data Structures”

    No full text
    We prove the Type Safety and Erasure Theorems presented in Section 4 of Ringenburg and Grossman’s paper “A Type System for Coordinated Data Structures ” [1]. We also remind the reader of the syntax, semantics, and typing rules for the coordinated list language described in Section 3 of the same paper. We refer the reader to the original paper for a detailed presentation of the coordinated data structure type system. 1 The Language Figures 1, 2, and 3 present, respectively, the syntax, semantics, and typing rules for our coordinated list language. We implicitly assume ∆ and Γ do not have repeated elements. For example, ∆, α:Îș is ill-formed if α ∈ Dom(∆). To avoid conflicts, we can systematically rename constructs with binding occurrences. We therefore treat ∆ and Γ as partial functions. All explicit occurrences of α and x in the grammar are binding (except when they constitute the entire type or expression, of course). Substitution is defined as usual

    AtomCaml

    No full text

    CosmoFlow: Using deep learning to learn the universe at scale

    No full text
    Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading for many element-wise operations, to improve training performance on IntelÂź Xeon Phiℱ processors. We also utilize the Cray PE Machine Learning Plugin for efficient scaling to multiple nodes. We demonstrate fully synchronous data-parallel training on 8192 nodes of Cori with 77% parallel efficiency, achieving 3.5 Pflop/s sustained performance. To our knowledge, this is the first large-scale science application of the TensorFlow framework at supercomputer scale with fully-synchronous training. These enhancements enable us to process large 3D dark matter distribution and predict the cosmological parameters ΩsubM/sub, σsub8/sub and nsubs/sub with unprecedented accuracy
    corecore