2 research outputs found
Evaluating critical bits in arithmetic operations due to timing violations
Various error models are being used in simulation of voltage-scaled arithmetic units to examine application-level tolerance of timing violations. The selection of an error model needs further consideration, as differences in error models drastically affect the performance of the application. Specifically, floating point arithmetic units (FPUs) have architectural characteristics that characterize its behavior. We examine the architecture of FPUs and design a new error model, which we call Critical Bit. We run selected benchmark applications with Critical Bit and other widely used error injection models to demonstrate the differences
Improving resilience to timing errors by exposing variability effects to software in tightly-coupled processor clusters
Manufacturing and environmental variations cause
timing errors in microelectronic processors that are typically
avoided by ultra-conservative multi-corner design margins or
corrected by error detection and recovery mechanisms at the
circuit-level. In contrast, we present here runtime software support
for cost-effective countermeasures against hardware timing
failures during system operation. We propose a variability-aware
OpenMP (VOMP) programming environment, suitable for
tightly-coupled shared memory processor clusters, that relies
upon modeling across the hardware/software interface. VOMP is
implemented as an extension to the OpenMP v3.0 programming
model that covers various parallel constructs, including ,
, and . Using the notion of work-unit vulnerability
(WUV) proposed here, we capture timing errors caused by
circuit-level variability as high-level software knowledge. WUV
consists of descriptive metadata to characterize the impact of variability
on different work-unit types running on various cores. As
such, WUV provides a useful abstraction of hardware variability
to efficiently allocate a given work-unit to a suitable core for
execution. VOMP enables hardware/software collaboration with
online variability monitors in hardware and runtime scheduling
in software. The hardware provides online per-core characterization
of WUV metadata. This metadata is made available by
carefully placing key data structures in a shared L1 memory
and is used by VOMP schedulerss. Our results show that VOMP
greatly reduces the cost of timing error recovery compared to the
baseline schedulers of OpenMP, yielding speedup of 3%\u201336%
for tasks, and 26%\u201349% for sections. Further, VOMP reaches
energy saving of 2%\u201346% and 15%\u201350% for tasks, and sections,
respectively