11 research outputs found
Developments and experimental evaluation of partitioning algorithms for adaptive computing systems
Multi-FPGA systems offer the potential to deliver higher performance solutions than traditional computers for some low-level computing tasks. This requires a flexible hardware substrate and an automated mapping system. CHAMPION is an automated mapping system for implementing image processing applications in multi-FPGA systems under development at the University of Tennessee. CHAMPION will map applications in the Khoros Cantata graphical programming environment to hardware. The work described in this dissertation involves the automation of the CHAMPION backend design flow, which includes the partitioning problem, netlist to structural VHDL conversion, synthesis and placement and routing, and host code generation. The primary goal is to investigate the development and evaluation of three different k-way partitioning approaches. In the first and the second approaches, we discuss the development and implementation of two existing algorithms. The first approach is a hierarchical partitioning method based on topological ordering (HP). The second approach is a recursive algorithm based on the Fiduccia and Mattheyses bipartitioning heuristic (RP). We extend these algorithms to handle the multiple constraints imposed by adaptive computing systems. We also introduce a new recursive partitioning method based on topological ordering and levelization (RPL). In addition to handling the partitioning constraints, the new approach efficiently addresses the problem of minimizing the number of FPGAs used and the amount of computation, thereby overcoming some of the weaknesses of the HP and RP algorithms
Algorithms and Hardware for Efficient Processing of Logic-based Neural Networks
Recent efforts to improve the performance of neural network (NN) accelerators
that meet today's application requirements have given rise to a new trend of
logic-based NN inference relying on fixed-function combinational logic (FFCL).
This paper presents an innovative optimization methodology for compiling and
mapping NNs utilizing FFCL into a logic processor. The presented method maps
FFCL blocks to a set of Boolean functions where Boolean operations in each
function are mapped to high-performance, low-latency, parallelized processing
elements. Graph partitioning and scheduling algorithms are presented to handle
FFCL blocks that cannot straightforwardly fit the logic processor. Our
experimental evaluations across several datasets and NNs demonstrate the
superior performance of our framework in terms of the inference throughput
compared to prior art NN accelerators. We achieve 25x higher throughput
compared with the XNOR-based accelerator for VGG16 model that can be amplified
5x deploying the graph partitioning and merging algorithms
Automatic mapping of graphical programming applications to microelectronic technologies
Adaptive computing systems (ACSs) and application-specific integrated circuits (ASICs) can serve as flexible hardware accelerators for applications in domains such as image processing and digital signal processing. However, the mapping of applications onto ACSs and ASICs using the traditional methods can take months for a hardware engineer to develop and debug. In this dissertation, a new approach for automatic mapping of software applications onto ACSs and ASICs has been developed, implemented and validated. This dissertation presents the design flow of the software environment called CHAMPION, which is being developed at the University of Tennessee. This environment permits high-level design entry using the Cantata graphical programming software fromKRI. Using Cantata as the design entry, CHAMPION hides from the user the low-level details of the hardware architecture and the finer issues of application mapping onto the hardware. Validation of the CHAMPION environment was performed using multiple applications of moderate complexity. In one case, theapplication mapping time which required six weeks to perform manually took only six minutes for CHAMPION, yet comparable results were produced. Furthermore, the CHAMPION environment was constructed such that retargeting to a new adaptive computing system could be accomplished in just a few hours as opposed to weeks using manual methods. Thus, CHAMPION permits both ACSs and ASICs to be utilized by a wider audience and application development accomplished in less time
Recommended from our members
Design of Hardware with Quantifiable Security against Reverse Engineering
Semiconductors are a 412 billion dollar industry and integrated circuits take on important roles in human life, from everyday use in smart-devices to critical applications like healthcare and aviation. Saving today\u27s hardware systems from attackers can be a huge concern considering the budget spent on designing these chips and the sensitive information they may contain. In particular, after fabrication, the chip can be subject to a malicious reverse engineer that tries to invasively figure out the function of the chip or other sensitive data. Subsequent to an attack, a system can be subject to cloning, counterfeiting, or IP theft. This dissertation addresses some issues concerning the security of hardware systems in such scenarios.
First, the issue of privacy risks from approximate computing is investigated in Chapter 2. Simulation experiments show that the erroneous outputs produced on each chip instance can reveal the identity of the chip that performed the computation, which jeopardizes user privacy.
The next two chapters deal with camouflaging, which is a technique to prevent reverse engineering from extracting circuit information from the layout. Chapter 3 provides a design automation method to protect camouflaged circuits against an adversary with prior knowledge about the circuit\u27s viable functions. Chapter 4 provides a method to reverse engineer camouflaged circuits. The proposed reverse engineering formulation uses Boolean Satisfiability (SAT) solving in a way that incorporates laser fault injection and laser voltage probing capabilities to figure out the function of an aggressively camouflaged circuit with unknown gate functions and connections.
Chapter 5 addresses the challenge of secure key storage in hardware by proposing a new key storage method that applies threshold-defined behavior of memory cells to store secret information in a way that achieves a high degree of protection against invasive reverse engineering. This approach requires foundry support to encode the secrets as threshold voltage offsets in transistors. In Chapter 6, a secret key storage approach is introduced that does not rely on a trusted foundry. This approach only relies on the foundry to fabricate the hardware infrastructure for key generation but not to encode the secret key. The key is programmed by the IP integrator or the user after fabrication via directed accelerated aging of transistors. Additionally, this chapter presents the design of a working hardware prototype on PCB that demonstrates this scheme.
Finally, chapter 7 concludes the dissertation and summarizes possible future research
Harnessing Simulation Acceleration to Solve the Digital Design Verification Challenge.
Today, design verification is by far the most resource and time-consuming activity of any new digital integrated circuit development. Within this area, the vast majority of the verification effort in industry relies on simulation platforms, which are implemented either in hardware or software. A "simulator" includes a model of each component of a design and has the capability of simulating its behavior under any input scenario provided by an engineer. Thus, simulators are deployed to evaluate the behavior of a design under as many input scenarios as possible and to identify and debug all incorrect functionality. Two features are critical in simulators for the validation effort to be effective: performance and checking/debugging capabilities. A wide range of simulator platforms are available today: on one end of the spectrum there are software-based simulators, providing a very rich software infrastructure for checking and debugging the design's functionality, but executing only at 1-10 simulation cycles per second (while actual chips operate at GHz speeds). At the other end of the spectrum, there are hardware-based platforms, such as accelerators, emulators and even prototype silicon chips, providing higher performances by 4 to 9 orders of magnitude, at the cost of very limited or non-existent checking/debugging capabilities. As a result, today, simulation-based validation is crippled: one can either have satisfactory performance on hardware-accelerated platforms or critical infrastructures for checking/debugging on software simulators, but not both.
This dissertation brings together these two ends of the spectrum by presenting solutions that offer high-performance simulation with effective checking and debugging capabilities. Specifically, it addresses the performance challenge of software simulators by leveraging inexpensive off-the-shelf graphics processors as massively parallel execution substrates, and then exposing the parallelism inherent in the design model to that architecture. For hardware-based platforms, the dissertation provides solutions that offer enhanced checking and debugging capabilities by abstracting the relevant data to be logged during simulation so to minimize the cost of collection, transfer and processing. Altogether, the contribution of this dissertation has the potential to solve the challenge of digital design verification by enabling effective high-performance simulation-based validation.PHDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/99781/1/dchatt_1.pd
Hardware Acceleration of Electronic Design Automation Algorithms
With the advances in very large scale integration (VLSI) technology, hardware is going
parallel. Software, which was traditionally designed to execute on single core microprocessors,
now faces the tough challenge of taking advantage of this parallelism, made available
by the scaling of hardware. The work presented in this dissertation studies the acceleration
of electronic design automation (EDA) software on several hardware platforms such
as custom integrated circuits (ICs), field programmable gate arrays (FPGAs) and graphics
processors. This dissertation concentrates on a subset of EDA algorithms which are heavily
used in the VLSI design flow, and also have varying degrees of inherent parallelism
in them. In particular, Boolean satisfiability, Monte Carlo based statistical static timing
analysis, circuit simulation, fault simulation and fault table generation are explored. The
architectural and performance tradeoffs of implementing the above applications on these
alternative platforms (in comparison to their implementation on a single core microprocessor)
are studied. In addition, this dissertation also presents an automated approach to
accelerate uniprocessor code using a graphics processing unit (GPU). The key idea is to
partition the software application into kernels in an automated fashion, such that multiple
instances of these kernels, when executed in parallel on the GPU, can maximally benefit
from the GPU?s hardware resources.
The work presented in this dissertation demonstrates that several EDA algorithms can
be successfully rearchitected to maximally harness their performance on alternative platforms
such as custom designed ICs, FPGAs and graphic processors, and obtain speedups upto 800X. The approaches in this dissertation collectively aim to contribute towards enabling
the computer aided design (CAD) community to accelerate EDA algorithms on arbitrary
hardware platforms
Design Space Re-Engineering for Power Minimization in Modern Embedded Systems
Power minimization is a critical challenge for modern embedded system design. Recently, due to the rapid increase of system's complexity and
the power density, there is a growing need for power control techniques at various design levels. Meanwhile, due to technology scaling, leakage power has become a significant part of power dissipation in the CMOS circuits and new techniques are needed to reduce leakage power.
As a result, many new power minimization techniques have been proposed such as voltage island, gate sizing, multiple supply and threshold voltage, power gating and input vector control, etc. These design options further
enlarge the design space and make it prohibitively expensive to explore
for the most energy efficient design solution.
Consequently, heuristic algorithms and randomized
algorithms are frequently used to explore the design space, seeking sub-optimal
solutions to meet the time-to-market requirements. These algorithms are
based on the idea of truncating the design space
and restricting the search in a subset of the original design space. While this approach can effectively reduce the runtime of searching, it may also exclude high-quality design solutions and cause design quality degradation.
When the solution to one problem is used as the base for another problem, such solution quality degradation will accumulate. In modern electronics system design, when several such algorithms are used in series to solve problems in different design levels, the final solution can be far off the optimal one.
In my Ph.D. work, I develop a {\em re-engineering} methodology to
facilitate exploring the design space of power efficient embedded systems design.
The direct goal is to enhance the performance of existing low power
techniques. The methodology is based on the idea that design quality can be improved
via iterative ``re-shaping'' the design space based on the ``bad'' structure
in the obtained design solutions; the searching run-time can be reduced
by the guidance from previous exploration. This approach can be
described in three phases: (1) apply the existing techniques to obtain
a sub-optimal solution; (2) analyze the solution and expand the design
space accordingly; and (3) re-apply the technique to re-explore the
enlarged design space.
We apply this methodology at different levels of embedded system design to
minimize power: (i) switching power reduction in sequential logic synthesis;
(ii) gate-level static leakage current reduction; (iii) dual threshold voltage CMOS
circuits design; and (iv) system-level energy-efficient detection scheme for
wireless sensor networks. An extensive amount of experiments have been conducted
and the results have shown that this methodology can effectively enhance
the power efficiency of the existing embedded system design flows with very little
overhead