6 research outputs found
Real-time wavefront processors for the next generation of adaptive optics systems: a design and analysis
Adaptive optics (AO) systems currently under investigation will require at least two orders of magitude increase in the number of actuators, which in turn translates to effectively a 104 increase in compute latency. Since the performance of an AO system invariably improves as the compute latency decreases, it is important to study how today's computer systems will scale to address this expected increase in actuator utilization. This paper answers this question by characterizing the performance of a single deformable mirror (DM) Shack-Hartmann natural guide star AO system implemented on the present-generation digital signal processor (DSP) TMS320C6701 from Texas Instruments. We derive the compute latency of such a system in terms of a few basic parameters, such as the number of DM actuators, the number of data channels used to read out the camera pixels, the number of DSPs, the available memory bandwidth, as well as the inter-processor communication (IPC) bandwidth and the pixel transfer rate. We show how the results would scale for future systems that utilizes multiple DMs and guide stars. We demonstrate that the principal performance bottleneck of such a system is the available memory bandwidth of the processors and to lesser extent the IPC bandwidth. This paper concludes with suggestions for mitigating this bottleneck
Energy Minimization of System Pipelines Using Multiple Voltages
Modem computer and communication system design has to consider
the timing constraints imposed by communication and system
pipelines, and minimize the energy consumption. We adopt the recent
proposed model for communication pipeline latency[23] and
address the problem of how to minimize the power consumption
in system-level pipelines under the latency constraints by selecting
supply voltage for each pipeline stage using the variable voltage
core-based system design methodology[l 11. We define the problem,
solve it optimally under realistic assumptions and develop algorithms
for power minimization of system pipeline designs based
on our theoretical results. We apply this new approach on the 4-
stage Myrinet GAM pipeline, with the appropriate voltage profiles,
we achieve 93.4%, 91.3% and 26.9% power reduction on three
pipeline stages over the traditional design
Techniques for Energy-Efficient Communication Pipeline Design
The performance of many modern computer and
communication systems is dictated by the latency of communication
pipelines. At the same time, power/energy consumption
is often another limiting factor in many portable systems. We
address the problem of how to minimize the power consumption
in system-level pipelines under latency constraints. In particular,
we apply fragmentation technique to achieve parallelism and
exploit advantages provided by variable voltage design methodology
to optimally select voltage and, therefore, speed of each
pipeline stage.We focus our study on the practical case when each
pipeline stage operates at a fixed speed. Unlike the conventional
pipeline system, where all stages run at the same speed, our
system may have different stages running at different speeds to
conserve energy while providing guaranteed latency. For a given
latency requirement, we find explicit solutions for the most energy
efficient fragmentation and voltage setting. We further study a
less practical case when each stage can dynamically change its
speed to get further energy saving. We define the problem and
transform it to a nonlinear system whose solution provides a lower
bound for energy consumption. We apply the obtained theoretical
results to develop algorithms for power/energy minimization of
computer and communication systems. The experimental result
suggests that significant power/energy reduction, is possible
without additional latency. In fact, we achieve almost 40% total
energy saving over the combined minimal supply voltage selection
and system shut-down technique and 85% if none of these two
energy minimization methods is used
Real-time wavefront processors for the next generation of adaptive optics systems: a design and analysis
Adaptive optics (AO) systems currently under investigation will require at least two orders of magitude increase in the number of actuators, which in turn translates to effectively a 104 increase in compute latency. Since the performance of an AO system invariably improves as the compute latency decreases, it is important to study how today's computer systems will scale to address this expected increase in actuator utilization. This paper answers this question by characterizing the performance of a single deformable mirror (DM) Shack-Hartmann natural guide star AO system implemented on the present-generation digital signal processor (DSP) TMS320C6701 from Texas Instruments. We derive the compute latency of such a system in terms of a few basic parameters, such as the number of DM actuators, the number of data channels used to read out the camera pixels, the number of DSPs, the available memory bandwidth, as well as the inter-processor communication (IPC) bandwidth and the pixel transfer rate. We show how the results would scale for future systems that utilizes multiple DMs and guide stars. We demonstrate that the principal performance bottleneck of such a system is the available memory bandwidth of the processors and to lesser extent the IPC bandwidth. This paper concludes with suggestions for mitigating this bottleneck
Modeling Communication Pipeline Latency
In this paper, we study how to minimize the latency of a message through a network that consists of a number of store-and-forward stages. This research is especially relevant for today's low overhead communication systems that employ dedicated processing elements for protocol processing. We develop an abstract pipeline model that reveals a crucial performance tradeoff involving the effects of the overhead of the bottleneck stage and the bandwidth of the remaining stages. We exploit this tradeoff to develop a suite of fragmentation algorithms designed to minimize message latency. We also provide an experimental methodology that enables the construction of customized pipeline algorithms that can adapt to the specific system characteristics and application workloads. By applying this methodology to the MyrinetGAM system, we have improved its latency by up to 51%. Our theoretical framework is also applicable to pipelined systems beyond the context of high speed networks. 1 Introduction T..