Search CORE

48 research outputs found

Embedding Data Mappers with Distributed Memory Machine Compilers

Author: Choudhary Alok
Das Raja
Koelbel Charles
Ponnusamy Ravi
Saltz Joel
Publication venue: SURFACE at Syracuse University
Publication date: 01/04/1992
Field of study

In scalable multiprocessor systems, high performance demands that computational load be balanced evenly among processors and that interprocessor communication be limited as much as possible. Compilation techniques for achieving these goals have been explored extensively in recent years [3, 9, 11, 13, 17, 18]. This research has produced a variety of useful techniques, but most of it has assumed that the programmer specifies the distribution of large data structures among processor memories. A few projects have attempted to automatically derive data distributions for regular problems [12, 10, 8, 1]. In this paper, we study the more challenging problem of automatically choosing data distributions for irregular problems

Syracuse University Research Facility and Collaborative Environment

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications

Author: Cheung Alvin
Kemper Alfons
Palkar Shoumik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/06/2018
Field of study

MapReduce is a popular programming paradigm for developing large-scale, data-intensive computation. Many frameworks that implement this paradigm have recently been developed. To leverage these frameworks, however, developers must become familiar with their APIs and rewrite existing code. Casper is a new tool that automatically translates sequential Java programs into the MapReduce paradigm. Casper identifies potential code fragments to rewrite and translates them in two steps: (1) Casper uses program synthesis to search for a program summary (i.e., a functional specification) of each code fragment. The summary is expressed using a high-level intermediate language resembling the MapReduce paradigm and verified to be semantically equivalent to the original using a theorem prover. (2) Casper generates executable code from the summary, using either the Hadoop, Spark, or Flink API. We evaluated Casper by automatically converting real-world, sequential Java benchmarks to MapReduce. The resulting benchmarks perform up to 48.2x faster compared to the original.Comment: 12 pages, additional 4 pages of references and appendi

arXiv.org e-Print Archive

Crossref

Improving Model-Based Software Synthesis: A Focus on Mathematical Structures

Author: Goens Jokisch Andres Wilhelm
Publication venue
Publication date: 14/05/2021
Field of study

Computer hardware keeps increasing in complexity. Software design needs to keep up with this. The right models and abstractions empower developers to leverage the novelties of modern hardware. This thesis deals primarily with Models of Computation, as a basis for software design, in a family of methods called software synthesis. We focus on Kahn Process Networks and dataﬂow applications as abstractions, both for programming and for deriving an eﬃcient execution on heterogeneous multicores. The latter we accomplish by exploring the design space of possible mappings of computation and data to hardware resources. Mapping algorithms are not at the center of this thesis, however. Instead, we examine the mathematical structure of the mapping space, leveraging its inherent symmetries or geometric properties to improve mapping methods in general. This thesis thoroughly explores the process of model-based design, aiming to go beyond the more established software synthesis on dataﬂow applications. We starting with the problem of assessing these methods through benchmarking, and go on to formally examine the general goals of benchmarks. In this context, we also consider the role modern machine learning methods play in benchmarking. We explore different established semantics, stretching the limits of Kahn Process Networks. We also discuss novel models, like Reactors, which are designed to be a deterministic, adaptive model with time as a ﬁrst-class citizen. By investigating abstractions and transformations in the Ohua language for implicit dataﬂow programming, we also focus on programmability. The focus of the thesis is in the models and methods, but we evaluate them in diverse use-cases, generally centered around Cyber-Physical Systems. These include the 5G telecommunication standard, automotive and signal processing domains. We even go beyond embedded systems and discuss use-cases in GPU programming and microservice-based architectures

Technische Universität Dresden: Qucosa

Actris: session-type based reasoning in separation logic

Author: Bengtson Jesper
Hinrichsen Jonas Kastberg
Krebbers Robbert
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Message passing is a useful abstraction to implement concurrent programs. For real-world systems, however, it is often combined with other programming and concurrency paradigms, such as higher-order functions, mutable state, shared-memory concurrency, and locks. We present Actris: a logic for proving functional correctness of programs that use a combination of the aforementioned features. Actris combines the power of modern concurrent separation logics with a first-class protocol mechanism - based on session types - for reasoning about message passing in the presence of other concurrency paradigms. We show that Actris provides a suitable level of abstraction by proving functional correctness of a variety of examples, including a distributed merge sort, a distributed load-balancing mapper, and a variant of the map-reduce model, using relatively simple specifications. Soundness of Actris is proved using a model of its protocol mechanism in the Iris framework. We mechanised the theory of Actris, together with tactics for symbolic execution of programs, as well as all examples in the paper, in the Coq proof assistant.Programming Language

Crossref

TU Delft Repository

The IT University of Copenhagen's Repository

CSI Neural Network: Using Side-channels to Recover Your Artificial Neural Network Information

Author: Batina Lejla
Bhasin Shivam
Jap Dirmanto
Picek Stjepan
Publication venue
Publication date: 23/05/2018
Field of study

Machine learning has become mainstream across industries. Numerous examples proved the validity of it for security applications. In this work, we investigate how to reverse engineer a neural network by using only power side-channel information. To this end, we consider a multilayer perceptron as the machine learning architecture of choice and assume a non-invasive and eavesdropping attacker capable of measuring only passive side-channel leakages like power consumption, electromagnetic radiation, and reaction time. We conduct all experiments on real data and common neural net architectures in order to properly assess the applicability and extendability of those attacks. Practical results are shown on an ARM CORTEX-M3 microcontroller. Our experiments show that the side-channel attacker is capable of obtaining the following information: the activation functions used in the architecture, the number of layers and neurons in the layers, the number of output classes, and weights in the neural network. Thus, the attacker can effectively reverse engineer the network using side-channel information. Next, we show that once the attacker has the knowledge about the neural network architecture, he/she could also recover the inputs to the network with only a single-shot measurement. Finally, we discuss several mitigations one could use to thwart such attacks.Comment: 15 pages, 16 figure

arXiv.org e-Print Archive

Cryptology ePrint Archive

Actris 2.0: Asynchronous Session-Type Based Reasoning in Separation Logic

Author: Bengtson Jesper
Hinrichsen Jonas Kastberg
Krebbers Robbert
Publication venue
Publication date: 11/10/2021
Field of study

Message passing is a useful abstraction for implementing concurrent programs. For real-world systems, however, it is often combined with other programming and concurrency paradigms, such as higher-order functions, mutable state, shared-memory concurrency, and locks. We present Actris: a logic for proving functional correctness of programs that use a combination of the aforementioned features. Actris combines the power of modern concurrent separation logics with a first-class protocol mechanism -- based on session types -- for reasoning about message passing in the presence of other concurrency paradigms. We show that Actris provides a suitable level of abstraction by proving functional correctness of a variety of examples, including a channel-based merge sort, a channel-based load-balancing mapper, and a variant of the map-reduce model, using concise specifications. While Actris was already presented in a conference paper (POPL'20), this paper expands the prior presentation significantly. Moreover, it extends Actris to Actris 2.0 with a notion of subprotocols -- based on session-type subtyping -- that permits additional flexibility when composing channel endpoints, and that takes full advantage of the asynchronous semantics of message passing in Actris. Soundness of Actris 2.0 is proven using a model of its protocol mechanism in the Iris framework. We have mechanised the theory of Actris, together with custom tactics, as well as all examples in the paper, in the Coq proof assistant.Comment: 60 pages, 24 figure

arXiv.org e-Print Archive

TU Delft Repository

Episciences.org

Full Stack Optimization of Transformer Inference: a Survey

Author: Dinh Grace
Genc Hasan
Gholami Amir
Hooper Coleman
Huang Qijing
Kang Minwoo
Keutzer Kurt
Kim Sehoon
Mahoney Michael W.
Shao Yakun Sophia
Wattanawong Thanakul
Yan Ruohan
Publication venue
Publication date: 27/02/2023
Field of study

Recent advances in state-of-the-art DNN architecture design have been moving toward Transformer models. These models achieve superior accuracy across a wide range of applications. This trend has been consistent over the past several years since Transformer models were originally introduced. However, the amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate, and this has made their deployment in latency-sensitive applications challenging. As such, there has been an increased focus on making Transformer models more efficient, with methods that range from changing the architecture design, all the way to developing dedicated domain-specific accelerators. In this work, we survey different approaches for efficient Transformer inference, including: (i) analysis and profiling of the bottlenecks in existing Transformer architectures and their similarities and differences with previous convolutional models; (ii) implications of Transformer architecture on hardware, including the impact of non-linear operations such as Layer Normalization, Softmax, and GELU, as well as linear operations, on hardware design; (iii) approaches for optimizing a fixed Transformer architecture; (iv) challenges in finding the right mapping and scheduling of operations for Transformer models; and (v) approaches for optimizing Transformer models by adapting the architecture using neural architecture search. Finally, we perform a case study by applying the surveyed optimizations on Gemmini, the open-source, full-stack DNN accelerator generator, and we show how each of these approaches can yield improvements, compared to previous benchmark results on Gemmini. Among other things, we find that a full-stack co-design approach with the aforementioned methods can result in up to 88.7x speedup with a minimal performance degradation for Transformer inference

arXiv.org e-Print Archive

Sessions and Separation

Author: Hinrichsen Jonas Kastberg
Publication venue: IT-Universitetet i København
Publication date: 01/01/2021
Field of study

The IT University of Copenhagen's Repository

Recommended from our members

Toward practical argument systems for verifiable computation

Author: Setty Srinath T.V.
Publication venue
Publication date: 09/02/2015
Field of study

textHow can a client extract useful work from a server without trusting it to compute correctly? A modern motivation for this classic question is third party computing models in which customers outsource their computations to service providers (as in cloud computing). In principle, deep results in complexity theory and cryptography imply that it is possible to verify that an untrusted entity executed a computation correctly. For instance, the server can employ probabilistically checkable proofs (PCPs) in conjunction with cryptographic commitments to generate a succinct proof of correct execution, which the client can efficiently check. However, these theoretical solutions are impractical: they require thousands of CPU years to verifiably execute even simple computations. This dissertation describes the design, implementation, and experimental evaluation viiiof a system, called Pepper, that brings this theory into the realm of plausibility. Pepper incorporates a series of algorithmic improvements and systems engineering techniques to improve performance by over 20 orders of magnitude, relative to an implementation of the theory without our refinements. These include a new probabilistically checkable proof encoding with nearly optimal asymptotics, a concise representation for computations, a more efficient cryptographic commitment primitive, and a distributed implementation of the server with GPU acceleration to reduce latency. Additionally, Pepper extends the verification machinery to handle realistic applications of third party computing: those that interact with remote storage or state (e.g., MapReduce jobs, database queries). To do so, Pepper composes techniques from untrusted storage with the aforementioned technical machinery to verifiably offload both computations and state. Furthermore, to make it easy to use this technology, Pepper includes a compiler to automatically transform programs in a subset of C into executables that run verifiably. One of the chief limitations of Pepper is that verifiable execution is still orders of magnitude slower than an unverifiable native execution. Nonetheless, Pepper takes powerful results from complexity theory and verifiable computation a few steps closer to practicalityComputer Science

Texas ScholarWorks

Geobase Information System Impacts on Space Image Formats

Author: Dozier J. C.
Frew J. E.
Marks D. G.
Simonett D. S.
Smith T. R.
Tobler W.
Publication venue
Publication date
Field of study

As Geobase Information Systems increase in number, size and complexity, the format compatability of satellite remote sensing data becomes increasingly more important. Because of the vast and continually increasing quantity of data available from remote sensing systems the utility of these data is increasingly dependent on the degree to which their formats facilitate, or hinder, their incorporation into Geobase Information Systems. To merge satellite data into a geobase system requires that they both have a compatible geographic referencing system. Greater acceptance of satellite data by the user community will be facilitated if the data are in a form which most readily corresponds to existing geobase data structures. The conference addressed a number of specific topics and made recommendations

NASA Technical Reports Server