4 research outputs found
Probabilistic Analysis and Algorithms for Reconfiguration of Memory Arrays
Coordinated Science Laboratory was formerly known as Control Systems LaboratoryPublication date of November 1990 on the cover of some copies was evidently a typographical error, and May 1990 is the correct date.Semiconductor Research Corporation (SRC) / 89-DP-10
Strategies for Optimising DRAM Repair
Dynamic Random Access Memories (DRAM) are large complex devices, prone to
defects during manufacture. Yield is improved by the provision of redundant
structures used to repair these defects. This redundancy is often
implemented by the provision of excess memory capacity and programmable
address logic allowing the replacement of faulty cells within the memory
array.
As the memory capacity of DRAM devices has increased, so has the complexity
of their redundant structures, introducing increasingly complex restrictions
and interdependencies upon the use of this redundant capacity.
Currently redundancy analysis algorithms solving the problem of optimally
allocating this redundant capacity must be manually customised for each new
device. Compromises made to reduce the complexity, and human error, reduce
the efficacy of these algorithms.
This thesis develops a methodology for automating the customisation of these
redundancy analysis algorithms. Included are: a modelling language
describing the redundant structures (including the restrictions and
interdependencies placed upon their use), algorithms manipulating this model
to generate redundancy analysis algorithms, and methods for translating
those algorithms into executable code.
Finally these concepts are used to develop a prototype software tool capable
of generating redundancy analysis algorithms customised for a specified
device
Fault Tolerant Task Mapping in Many-Core Systems
The advent of many-core systems, a network on chip containing hundreds or thousands of homogeneous processors cores, present new challenges in managing the cores effectively in response to processing demands, hardware faults and the need for heat management.
Continually diminishing feature size of devices increase the probability of fabrication de- fects and the variability of performance of individual transistors. In many-core systems this can result in the failure of individual processing cores, routing nodes or communication links, which require the use of fault tolerant mechanisms. Diminishing feature size also increases the power density of devices, giving rise to the concept of dark silicon where only a portion of the functionality available on a chip can be active at any one time.
Core fault tolerance and management of dark silicon can both be achieved by allocating a percentage of cores to be idle at any one time. Idle cores can be used as dark silicon to evenly distribute heat generated by processing cores and can also be used as spare cores to implement fault tolerance. Both of these can be achieved by the dynamic allocation of processes to tasks in response to changes to the status of hardware resources and the demands placed on the system, which in turn requires real time task mapping.
This research proposes the use of a continuous fault/recovery cycle to implement graceful degradation and amelioration to provide real-time fault tolerance. Objective measures for core fault tolerance, link fault tolerance, network power and excess traffic have been developed for use by a multi-objective evolutionary algorithm that uses knowledge of the processing demands and hardware status to identify optimal task mappings.
The fault/recovery cycle is shown to be effective in maintaining a high level of performance of a many-core array when presented with a series of hardware faults