Search CORE

236 research outputs found

Run Time Approximation of Non-blocking Service Rates for Streaming Systems

Author: Beard Jonathan C.
Chamberlain Roger D.
Publication venue
Publication date: 12/04/2015
Field of study

Stream processing is a compute paradigm that promises safe and efficient parallelism. Modern big-data problems are often well suited for stream processing's throughput-oriented nature. Realization of efficient stream processing requires monitoring and optimization of multiple communications links. Most techniques to optimize these links use queueing network models or network flow models, which require some idea of the actual execution rate of each independent compute kernel within the system. What we want to know is how fast can each kernel process data independent of other communicating kernels. This is known as the "service rate" of the kernel within the queueing literature. Current approaches to divining service rates are static. Modern workloads, however, are often dynamic. Shared cloud systems also present applications with highly dynamic execution environments (multiple users, hardware migration, etc.). It is therefore desirable to continuously re-tune an application during run time (online) in response to changing conditions. Our approach enables online service rate monitoring under most conditions, obviating the need for reliance on steady state predictions for what are probably non-steady state phenomena. First, some of the difficulties associated with online service rate determination are examined. Second, the algorithm to approximate the online non-blocking service rate is described. Lastly, the algorithm is implemented within the open source RaftLib framework for validation using a simple microbenchmark as well as two full streaming applications.Comment: technical repor

arXiv.org e-Print Archive

Crossref

LSIM User Manual

Author: Chamberlain Roger D.
Publication venue: Washington University Open Scholarship
Publication date: 01/01/1986
Field of study

Lsim is a gate/switch level digital logic similar. It enables users to model digital circuits both at the gate and switch level and incorporates features that support investigation of the simulation task itself. This user\u27s manual describes the procedures used to specify a circuit to lsim and control the simulation of the circuit (i.e., specifying inputs vectors, running the simulation, and monitoring output signals)

Washington University St. Louis: Open Scholarship

Performance Modeling of Virtualized Custom Logic Computations

Author: Chamberlain Roger D
Hall Michael J
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2014
Field of study

Virtualization of custom logic computations (i.e., by sharing a fixed function across distinct data streams) provides a means of reusing hardware resources, particularly when resources are limited. This is common practice in traditional processors where more than one user can share processor resources. In this paper, we virtualize a custom logic block using C-slow techniques to support fine-grain context-switching. We then develop and present an analytic model for several performance measures (throughput, latency, input queue occupancy) for both fine-grained and coarse-grained context switching (to a secondary memory). Next, we calibrate the analytic performance model with empirical measurements. We then validate the model via discrete-event simulation and use the model to predict the performance and develop optimal schedules for virtualized logic computations. We present results for a Taylor series expansion of a cosine function with added feedback and an AES encryption cipher

Crossref

Washington University St. Louis: Open Scholarship

Hierarchical Discrete-Event Simulation on Hypercube Architecture

Author: Chamberlain Roger D.
Franklin Mark A.
Publication venue: Washington University Open Scholarship
Publication date: 01/03/1988
Field of study

This paper presents model of hierarchical discrete-event simulation algorithm running on a hypercube architecture. We assume a static allocation of system components to processors in the hypercube. We also assume a global clock algorithm, with an event-based time increment. Following development of the performance model, we describe an application of the model in the area of digital systems simulation. Hierarchical levels included are gate level (NAND, NOR, and NOT gates) and MSI level (multiplexors, shift registers, etc.). Example values (gathered from simulations running on standard von Neumann architectures) are provided at the model inputs to show the effect of different model parameters and partitioning strategies on the simulation performance

Washington University St. Louis: Open Scholarship

A Unified Approach to Mixed-Mode Simulation

Author: Chamberlain Roger D.
Franklin Mark A.
Publication venue: Washington University Open Scholarship
Publication date: 01/11/1986
Field of study

This paper presents a unified approach to mixed-mode simulation. It investigates the algorithms for both logic and circuit simulation, considering their similarities and differences, and a general framework is presented for integrating the two algorithms in uniform manner. The time advance mechanisms and component functional evaluations of the algorithms are show to be similar in nature, and mechanisms for the translation of information represented uniquely in the two algorithms are given. The resulting integrated algorithms is capable of performing mixed-mode simulation, where a circuit is partitioned into discrete and continuous regions, and each region is simulated at the appropriate level. In addition, several of the issues relating to the implementation of mixed-mode simulation on multiprocessors are presented

Washington University St. Louis: Open Scholarship

Collecting Data About Logic Simulation

Author: Chamberlain Roger D.
Franklin Mark A.
Publication venue: Washington University Open Scholarship
Publication date: 01/05/1985
Field of study

Design of high performance hardware and software based gate-switch level logic simulators requires knowledge about the logic simulation process itself. Unfortunately, little data is publically available concerning key aspects of this process. An example of this is the lack of published empirical measurements relating to the time distribution of events generated by such simulators. This paper presents a gate-switch level logic simulator lsim which is oriented towards the collection of data about the simulation process. The basic components of lsim are reviewed, and its relevant data gathering facilities are discussed. An example is presented which illustrates the use of lsim in gathering data on event distributions and on communications requirements under alternative logic circuit partitionings

Washington University St. Louis: Open Scholarship

LSIM2 User\u27s Manual

Author: Chamberlain Roger D.
Edelman Mark N.
Publication venue: Washington University Open Scholarship
Publication date: 01/02/1988
Field of study

Lsim2 is gate/switch-level digital logic simulator. It enables users to model digital circuits both at the gate and switch level and incorporates features the support investigation of the simulation task itself. Lsim2 is an augmented version of the original lsim* with the addition of several new MSI-type components models. This user\u27s manual describes procedures for specifying a circuit in lsim2, mechanisms for controlling the simulation, and approaches to modeling systems

Washington University St. Louis: Open Scholarship

Split and Merge Functions for Supporting Multiple Processing Pipelines in Mercury BLASTN

Author: Ahir Jwalant
Buhler Jeremy
Chamberlain Roger D.
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2010
Field of study

Biosequence similarity search is an important application in computational biology. Mercury BLASTN, an FPGA-based implementation of BLAST for DNA, is one of the alternatives for fast DNA sequence comparison. The re-design of BLAST into a streaming application combined with a high-throughput hardware pipeline have enabled Mercury BLAST to emerge as one of the fastest implementations of bio-sequence similarity search. This performance can be further enhanced by exploiting the data-level parallelism present within the application. Here we present a multiple FPGA-based Mercury BLASTN design in order to double the speed and throughput of DNA sequence computation. This paper describes a dual Mercury BLASTN design, the detailed design of the split and merge functions, and simulation results

Washington University St. Louis: Open Scholarship

Performance Tuning of Streaming Applications via Search-space Decomposition

Author: Chamberlain Roger D.
Chen Yixin
Padmanabhan Shobana
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2010
Field of study

High-performance streaming applications are typically pipelined and deployed on architecturally diverse (hybrid)systems. Developers of such applications are interested in customizing components used, so as to benefit application performance. We present an efficient and automatic technique for design-space exploration of applications in this problem domain. We solve performance tuning as an optimization problem by formulating cost functions using results from queueing theory. This results in a mixed-integer nonlinear optimization problem which is NP-hard. We reduce the search complexity by decomposing the search space. We have developed a domain-specific decomposition technique using topological information of the application embodied in the queueing network models. Our analysis includes when our decomposition preserves optimality. Our preliminary empirical results confirm two-fold benefits--solving a problem that is currently not solvable using state-of-the-art solvers and in some problem instances, improving initial solution value from the solver by over two orders of magnitude

Washington University St. Louis: Open Scholarship

Throughput-optimal systolic arrays from recurrence equations

Author: Buhler Jeremy D.
Chamberlain Roger D.
Jacob Arpith C.
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2009
Field of study

Many compute-bound software kernels have seen order-of-magnitude speedups on special-purpose accelerators built on specialized architectures such as field-programmable gate arrays (FPGAs). These architectures are particularly good at implementing dynamic programming algorithms that can be expressed as systems of recurrence equations, which in turn can be realized as systolic array designs. To efficiently find good realizations of an algorithm for a given hardware platform, we pursue software tools that can search the space of possible parallel array designs to optimize various design criteria. Most existing design tools in this area produce a design that is latency-space optimal. However, we instead wish to target applications that operate on a large collection of small inputs, e.g. a database of biological sequences. For such applications, overall throughput rather than latency per input is the most important measure of performance. In this work, we introduce a new procedure to optimize throughput of a systolic array subject to resource constraints, in this case the area and bandwidth constraints of an FPGA device. We show that the throughput of an array is dependent on the maximum number of lattice points executed by any processor in the array, which to a close approximation is determined solely by the array’s projection vector. We describe a bounded search process to find throughput-optimal projection vectors and a tool to perform automated design space exploration, discovering a range of array designs that are optimal for inputs of different sizes. We apply our techniques to the Nussinov RNA folding algorithm to generate multiple mappings of this algorithm into systolic arrays. By combining our library of designs with run-time reconfiguration of an FPGA device to dynamically switch among them, we predict significant speedup over a single, latency-space optimal array

Washington University St. Louis: Open Scholarship