48 research outputs found
Embedding Data Mappers with Distributed Memory Machine Compilers
In scalable multiprocessor systems, high performance demands that computational load be balanced evenly among processors and that interprocessor communication be limited as much as possible. Compilation techniques for achieving these goals have been explored extensively in recent years [3, 9, 11, 13, 17, 18]. This research has produced a variety of useful techniques, but most of it has assumed that the programmer specifies the distribution of large data structures among processor memories. A few projects have attempted to automatically derive data distributions for regular problems [12, 10, 8, 1]. In this paper, we study the more challenging problem of automatically choosing data distributions for irregular problems
Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications
MapReduce is a popular programming paradigm for developing large-scale,
data-intensive computation. Many frameworks that implement this paradigm have
recently been developed. To leverage these frameworks, however, developers must
become familiar with their APIs and rewrite existing code. Casper is a new tool
that automatically translates sequential Java programs into the MapReduce
paradigm. Casper identifies potential code fragments to rewrite and translates
them in two steps: (1) Casper uses program synthesis to search for a program
summary (i.e., a functional specification) of each code fragment. The summary
is expressed using a high-level intermediate language resembling the MapReduce
paradigm and verified to be semantically equivalent to the original using a
theorem prover. (2) Casper generates executable code from the summary, using
either the Hadoop, Spark, or Flink API. We evaluated Casper by automatically
converting real-world, sequential Java benchmarks to MapReduce. The resulting
benchmarks perform up to 48.2x faster compared to the original.Comment: 12 pages, additional 4 pages of references and appendi
Improving Model-Based Software Synthesis: A Focus on Mathematical Structures
Computer hardware keeps increasing in complexity. Software design needs to keep up with this. The right models and abstractions empower developers to leverage the novelties of modern hardware. This thesis deals primarily with Models of Computation, as a basis for software design, in a family of methods called software synthesis.
We focus on Kahn Process Networks and dataflow applications as abstractions, both for programming and for deriving an efficient execution on heterogeneous multicores. The latter we accomplish by exploring the design space of possible mappings of computation and data to hardware resources. Mapping algorithms are not at the center of this thesis, however. Instead, we examine the mathematical structure of the mapping
space, leveraging its inherent symmetries or geometric properties to improve mapping methods in general.
This thesis thoroughly explores the process of model-based design, aiming to go beyond the more established software synthesis on dataflow applications. We starting with the problem of assessing these methods through benchmarking, and go on to formally examine the general goals of benchmarks. In this context, we also consider the role modern machine learning methods play in benchmarking.
We explore different established semantics, stretching the limits of Kahn Process Networks. We also discuss novel models, like Reactors, which are designed to be a deterministic, adaptive model with time as a first-class citizen. By investigating abstractions and transformations in the Ohua language for implicit dataflow programming, we also focus on programmability.
The focus of the thesis is in the models and methods, but we evaluate them in diverse use-cases, generally centered around Cyber-Physical Systems. These include the 5G telecommunication standard, automotive and signal processing domains. We even go beyond embedded systems and discuss use-cases in GPU programming and microservice-based architectures
Actris: session-type based reasoning in separation logic
Message passing is a useful abstraction to implement concurrent programs. For real-world systems, however, it is often combined with other programming and concurrency paradigms, such as higher-order functions, mutable state, shared-memory concurrency, and locks. We present Actris: a logic for proving functional correctness of programs that use a combination of the aforementioned features. Actris combines the power of modern concurrent separation logics with a first-class protocol mechanism - based on session types - for reasoning about message passing in the presence of other concurrency paradigms. We show that Actris provides a suitable level of abstraction by proving functional correctness of a variety of examples, including a distributed merge sort, a distributed load-balancing mapper, and a variant of the map-reduce model, using relatively simple specifications. Soundness of Actris is proved using a model of its protocol mechanism in the Iris framework. We mechanised the theory of Actris, together with tactics for symbolic execution of programs, as well as all examples in the paper, in the Coq proof assistant.Programming Language
CSI Neural Network: Using Side-channels to Recover Your Artificial Neural Network Information
Machine learning has become mainstream across industries. Numerous examples
proved the validity of it for security applications. In this work, we
investigate how to reverse engineer a neural network by using only power
side-channel information. To this end, we consider a multilayer perceptron as
the machine learning architecture of choice and assume a non-invasive and
eavesdropping attacker capable of measuring only passive side-channel leakages
like power consumption, electromagnetic radiation, and reaction time.
We conduct all experiments on real data and common neural net architectures
in order to properly assess the applicability and extendability of those
attacks. Practical results are shown on an ARM CORTEX-M3 microcontroller. Our
experiments show that the side-channel attacker is capable of obtaining the
following information: the activation functions used in the architecture, the
number of layers and neurons in the layers, the number of output classes, and
weights in the neural network. Thus, the attacker can effectively reverse
engineer the network using side-channel information.
Next, we show that once the attacker has the knowledge about the neural
network architecture, he/she could also recover the inputs to the network with
only a single-shot measurement. Finally, we discuss several mitigations one
could use to thwart such attacks.Comment: 15 pages, 16 figure
Actris 2.0: Asynchronous Session-Type Based Reasoning in Separation Logic
Message passing is a useful abstraction for implementing concurrent programs.
For real-world systems, however, it is often combined with other programming
and concurrency paradigms, such as higher-order functions, mutable state,
shared-memory concurrency, and locks. We present Actris: a logic for proving
functional correctness of programs that use a combination of the aforementioned
features. Actris combines the power of modern concurrent separation logics with
a first-class protocol mechanism -- based on session types -- for reasoning
about message passing in the presence of other concurrency paradigms. We show
that Actris provides a suitable level of abstraction by proving functional
correctness of a variety of examples, including a channel-based merge sort, a
channel-based load-balancing mapper, and a variant of the map-reduce model,
using concise specifications. While Actris was already presented in a
conference paper (POPL'20), this paper expands the prior presentation
significantly. Moreover, it extends Actris to Actris 2.0 with a notion of
subprotocols -- based on session-type subtyping -- that permits additional
flexibility when composing channel endpoints, and that takes full advantage of
the asynchronous semantics of message passing in Actris. Soundness of Actris
2.0 is proven using a model of its protocol mechanism in the Iris framework. We
have mechanised the theory of Actris, together with custom tactics, as well as
all examples in the paper, in the Coq proof assistant.Comment: 60 pages, 24 figure
Full Stack Optimization of Transformer Inference: a Survey
Recent advances in state-of-the-art DNN architecture design have been moving
toward Transformer models. These models achieve superior accuracy across a wide
range of applications. This trend has been consistent over the past several
years since Transformer models were originally introduced. However, the amount
of compute and bandwidth required for inference of recent Transformer models is
growing at a significant rate, and this has made their deployment in
latency-sensitive applications challenging. As such, there has been an
increased focus on making Transformer models more efficient, with methods that
range from changing the architecture design, all the way to developing
dedicated domain-specific accelerators. In this work, we survey different
approaches for efficient Transformer inference, including: (i) analysis and
profiling of the bottlenecks in existing Transformer architectures and their
similarities and differences with previous convolutional models; (ii)
implications of Transformer architecture on hardware, including the impact of
non-linear operations such as Layer Normalization, Softmax, and GELU, as well
as linear operations, on hardware design; (iii) approaches for optimizing a
fixed Transformer architecture; (iv) challenges in finding the right mapping
and scheduling of operations for Transformer models; and (v) approaches for
optimizing Transformer models by adapting the architecture using neural
architecture search. Finally, we perform a case study by applying the surveyed
optimizations on Gemmini, the open-source, full-stack DNN accelerator
generator, and we show how each of these approaches can yield improvements,
compared to previous benchmark results on Gemmini. Among other things, we find
that a full-stack co-design approach with the aforementioned methods can result
in up to 88.7x speedup with a minimal performance degradation for Transformer
inference
Recommended from our members
Toward practical argument systems for verifiable computation
textHow can a client extract useful work from a server without trusting it to compute correctly? A modern motivation for this classic question is third party computing models in which customers outsource their computations to service providers (as in cloud computing). In principle, deep results in complexity theory and cryptography imply that it is possible to verify that an untrusted entity executed a computation correctly. For instance, the server can employ probabilistically checkable proofs (PCPs) in conjunction with cryptographic commitments to generate a succinct proof of correct execution, which the client can efficiently check. However, these theoretical solutions are impractical: they require thousands of CPU years to verifiably execute even simple computations. This dissertation describes the design, implementation, and experimental evaluation viiiof a system, called Pepper, that brings this theory into the realm of plausibility. Pepper incorporates a series of algorithmic improvements and systems engineering techniques to improve performance by over 20 orders of magnitude, relative to an implementation of the theory without our refinements. These include a new probabilistically checkable proof encoding with nearly optimal asymptotics, a concise representation for computations, a more efficient cryptographic commitment primitive, and a distributed implementation of the server with GPU acceleration to reduce latency. Additionally, Pepper extends the verification machinery to handle realistic applications of third party computing: those that interact with remote storage or state (e.g., MapReduce jobs, database queries). To do so, Pepper composes techniques from untrusted storage with the aforementioned technical machinery to verifiably offload both computations and state. Furthermore, to make it easy to use this technology, Pepper includes a compiler to automatically transform programs in a subset of C into executables that run verifiably. One of the chief limitations of Pepper is that verifiable execution is still orders of magnitude slower than an unverifiable native execution. Nonetheless, Pepper takes powerful results from complexity theory and verifiable computation a few steps closer to practicalityComputer Science
Geobase Information System Impacts on Space Image Formats
As Geobase Information Systems increase in number, size and complexity, the format compatability of satellite remote sensing data becomes increasingly more important. Because of the vast and continually increasing quantity of data available from remote sensing systems the utility of these data is increasingly dependent on the degree to which their formats facilitate, or hinder, their incorporation into Geobase Information Systems. To merge satellite data into a geobase system requires that they both have a compatible geographic referencing system. Greater acceptance of satellite data by the user community will be facilitated if the data are in a form which most readily corresponds to existing geobase data structures. The conference addressed a number of specific topics and made recommendations