1 research outputs found
Repairability Enhancement of Scalable Systems with Locally Shared Spares
Future systems based on nano-scale devices will provide great potentials for
scaling up in system complexity, yet they will be highly susceptible to
operational faults. While spare units can be generally used to enhance
reliability, they must be shared in a limited way among functional units to
ensure low-cost overheads when systems scale up. Furthermore, the efficiency of
achieving reliability using spare units heavily depends on the replacement
mechanisms of such spares. While global and chained replacement approaches can
take advantage of the entire replacement capabilities in the network, they
usually impose some sort of disturbance to all the functional units in the
system during the repair process, thus are dreadfully expensive in terms of
performance overhead for systems with high fault rates. In this paper, we focus
on a low-cost, fast, immediate replacement mechanism that can be implemented
locally with minimum disturbance to the system. The proposed schemes aim for
maintaining the system with high fault rates in such a low-cost, fast
repairable status for many faults before invoking the more expensive, yet
optimal, approaches. First, we propose an online repair algorithm: as faults
occur during the run-time of the system, the proposed algorithm makes a choice
of a spare unit (among several candidates), such that the overall impact on
system repairability in the future is minimized. Second, we propose a network
enhancement approach, which identifies and connects the vulnerable units to the
exploitable spares, thus strengthening the entire system at a low cost.Comment: 10 page