12 research outputs found
A SIMD architecture for hard real-time systems
Emerging safety-critical systems require high-performance data-parallel architectures and, problematically, ones that can guarantee tight and safe worst-case execution times. Given the complexity of existing architectures like GPUs, it is unlikely that sufficiently accurate models and algorithms for timing analysis will emerge in the foreseeable future. This motivates a clean-slate approach to designing a real-time data-parallel architecture.
In this work I present Sim-D: a wide-SIMD architecture for hard real-time systems. Similar to GPUs, Sim-D performs hardware strip-mining to schedule the work for a compute kernel in entities called work-groups. Sim-D schedules the work for each work-group as a sequence of uninterruptible access- and execute program phases, interleaving the phases of two work-groups. By providing performance isolation between the memory- and compute resources, the execution time of each phase can be tightly bound through static analysis.
I present a predictable closed-page DRAM controller that processes requests for large 1D- and 2D blocks of data, as well as indirect indexed transfers. These large transfers coalesce the data requests of a whole work-group. For a linear 4KiB transfer over a 64-bit data bus, the utilisation provably exceeds 78% for DDR4-3200AA DRAM. For 2D blocks, a well-chosen tiling configuration can achieve near-similar efficiency. I show that bounds on the execution time of indexed transfers are pessimistic by nature, but propose a novel snoopy indexed transfer mechanism that permits more reasonable bounds when the buffer size is limited.
Finally, I present a worst-case execution time calculation algorithm for Sim-D. This algorithm is paired with two hardware work-group scheduling policies that deterministically reduce run-time variance. The worst-case execution time analysis algorithm combines static control flow analysis with a simulation-based cost model for execution and DRAM transfers. Its key novelty is the addition of a stage that considers work-group scheduling effects. I show that the work-group scheduling policies degrade performance on average by 8.9%, but permit the calculation of worst-case execution time bounds that are tight within 14.3% on average for benchmarks that avoid inefficient indexed transfers
Placement of dynamic data objects over heterogeneous memory organizations in embedded systems
Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadoras y Automática, leÃda el 24-11-2015Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu
The 1992 4th NASA SERC Symposium on VLSI Design
Papers from the fourth annual NASA Symposium on VLSI Design, co-sponsored by the IEEE, are presented. Each year this symposium is organized by the NASA Space Engineering Research Center (SERC) at the University of Idaho and is held in conjunction with a quarterly meeting of the NASA Data System Technology Working Group (DSTWG). One task of the DSTWG is to develop new electronic technologies that will meet next generation electronic data system needs. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The NASA SERC is proud to offer, at its fourth symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories, the electronics industry, and universities. These speakers share insights into next generation advances that will serve as a basis for future VLSI design
The Deep Space Network, volume 3 Progress report, Mar. - Apr. 1971
Deep Space Network telecommunication and ground support equipment for planetary and interplanetary flight project
Compiler-Directed Energy Savings in Superscalar Processors
Institute for Computing Systems ArchitectureSuperscalar processors contain large, complex structures to hold data and instructions as
they wait to be executed. However, many of these structures consume large amounts of energy,
making them hotspots requiring sophisticated cooling systems. With the trend towards larger,
more complex processors, this will become more of a problem, having important implications
for future technology.
This thesis uses compiler-based optimisation schemes to target the issue queue and register
file. These are two of the most energy consuming structures in the processor. The algorithms
and hardware techniques developed in this work dynamically adapt the processor's resources
to the changing program phases, turning off parts of each structure when they are unused to
save dynamic and static energy.
To optimise the issue queue, the compiler analysis tracks data dependences through each
program procedure. It identifies the critical path through each program region and informs the
hardware of the minimum number of queue entries required to prevent it slowing down. This
reduces the occupancy of the queue and increases the opportunities to save energy. With just a
1.3% performance loss, 26% dynamic and 32% static energy savings are achieved.
Registers can be idle for many cycles after they are last read, before they are released and
put back on the free-list to be reused by another instruction. Alternatively, they can be turned
off for energy savings. Early register releasing can be used to perform this operation sooner
than usual, but hardware schemes must wait for the instruction redefining the relevant logical
register to enter the pipeline. This thesis presents an exploration of compiler-directed early
register releasing. The compiler can exactly identify the last use of each register and pass the
information to the hardware, based on simple data-flow and liveness analysis. The best scheme
achieves 15% dynamic and 19% static energy savings.
Finally, the issue queue limiting and early register releasing schemes are combined for energy
savings in both processor structures. Four different configurations are evaluated bringing
25% to 31% dynamic and 19% to 34% static issue queue energy savings and reductions of 18%
to 25% dynamic and 20% to 21% static energy in the register file
The Fifth NASA Symposium on VLSI Design
The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design