19,467 research outputs found
Fault-tolerant sub-lithographic design with rollback recovery
Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (Pf = 10^-7) in systems with 10^12 susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme
POOR MAN’S TRACE CACHE: A VARIABLE DELAY SLOT ARCHITECTURE
We introduce a novel fetch architecture called Poor Man’s Trace Cache (PMTC). PMTC constructs taken-path instruction traces via instruction replication in static code and inserts them after unconditional direct and select conditional direct control transfer instructions. These traces extend to the end of the cache line. Since available space for trace insertion may vary by the position of the control transfer instruction within the line, we refer to these fetch slots as variable delay slots. This approach ensures traces are fetched along with the control transfer instruction that initiated the trace. Branch, jump and return instruction semantics as well as the fetch unit are modified to utilize traces in delay slots. PMTC yields the following benefits: 1. Average fetch bandwidth increases as the front end can fetch across taken control transfer instructions in a single cycle. 2. The dynamic number of instruction cache lines fetched by the processor is reduced as multiple non contiguous basic blocks along a given path are encountered in one fetch cycle. 3. Replication of a branch instruction along multiple paths provides path separability for branches, which positively impacts branch prediction accuracy. PMTC mechanism requires minimal modifications to the processor’s fetch unit and the trace insertion algorithm can easily be implemented within the assembler without compiler support
Building on Quicksand
Reliable systems have always been built out of unreliable components. Early
on, the reliable components were small such as mirrored disks or ECC (Error
Correcting Codes) in core memory. These systems were designed such that
failures of these small components were transparent to the application. Later,
the size of the unreliable components grew larger and semantic challenges crept
into the application when failures occurred.
As the granularity of the unreliable component grows, the latency to
communicate with a backup becomes unpalatable. This leads to a more relaxed
model for fault tolerance. The primary system will acknowledge the work request
and its actions without waiting to ensure that the backup is notified of the
work. This improves the responsiveness of the system.
There are two implications of asynchronous state capture: 1) Everything
promised by the primary is probabilistic. There is always a chance that an
untimely failure shortly after the promise results in a backup proceeding
without knowledge of the commitment. Hence, nothing is guaranteed! 2)
Applications must ensure eventual consistency. Since work may be stuck in the
primary after a failure and reappear later, the processing order for work
cannot be guaranteed.
Platform designers are struggling to make this easier for their applications.
Emerging patterns of eventual consistency and probabilistic execution may soon
yield a way for applications to express requirements for a "looser" form of
consistency while providing availability in the face of ever larger failures.
This paper recounts portions of the evolution of these trends, attempts to
show the patterns that span these changes, and talks about future directions as
we continue to "build on quicksand".Comment: CIDR 200
- …