Automatic Design of Efficient Application-centric Architectures.
- Publication date
- Publisher
Abstract
As the market for embedded devices continues to grow, the demand for high
performance, low cost, and low power computation grows as well. Many embedded
applications perform computationally intensive tasks such as processing streaming
video or audio, wireless communication, or speech recognition and must be
implemented within tight power budgets. Typically, general
purpose processors are not able to meet these performance and power requirements.
Custom hardware in the form of loop accelerators are often used to execute the
compute-intensive portions of these applications because they can achieve significantly
higher levels of performance and power efficiency.
Automated hardware synthesis from high level specifications is a key technology
used in designing these accelerators, because the resulting hardware is correct by
construction, easing verification and greatly decreasing time-to-market in the quickly
evolving embedded domain. In this dissertation, a compiler-directed approach is used
to design a loop accelerator from a C specification and a throughput requirement. The
compiler analyzes the loop and generates a virtual architecture containing sufficient
resources to sustain the required throughput. Next, a software pipelining scheduler
maps the operations in the loop to the virtual architecture. Finally, the accelerator
datapath is derived from the resulting schedule.
In this dissertation, synthesis of different types of loop accelerators is investigated.
First, the system for synthesizing single loop accelerators is detailed. In particular, a
scheduler is presented that is aware of the effects of its decisions on the resulting hardware,
and attempts to minimize hardware cost. Second, synthesis of multifunction
loop accelerators, or accelerators capable of executing multiple loops, is presented.
Such accelerators exploit coarse-grained hardware sharing across loops in order to reduce
overall cost. Finally, synthesis of post-programmable accelerators is presented,
allowing changes to be made to the software after an accelerator has been created.
The tradeoffs between the flexibility, cost, and energy efficiency of these different
types of accelerators are investigated. Automatically synthesized loop accelerators
are capable of achieving order-of-magnitude gains in performance, area efficiency,
and power efficiency over processors, and programmable accelerators allow software
changes while maintaining highly efficient levels of computation.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61644/1/fank_1.pd