24,110 research outputs found
Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency
Persistent memory provides high-performance data persistence at main memory.
Memory writes need to be performed in strict order to satisfy storage
consistency requirements and enable correct recovery from system crashes.
Unfortunately, adhering to such a strict order significantly degrades system
performance and persistent memory endurance. This paper introduces a new
mechanism, Loose-Ordering Consistency (LOC), that satisfies the ordering
requirements at significantly lower performance and endurance loss. LOC
consists of two key techniques. First, Eager Commit eliminates the need to
perform a persistent commit record write within a transaction. We do so by
ensuring that we can determine the status of all committed transactions during
recovery by storing necessary metadata information statically with blocks of
data written to memory. Second, Speculative Persistence relaxes the write
ordering between transactions by allowing writes to be speculatively written to
persistent memory. A speculative write is made visible to software only after
its associated transaction commits. To enable this, our mechanism supports the
tracking of committed transaction ID and multi-versioning in the CPU cache. Our
evaluations show that LOC reduces the average performance overhead of memory
persistence from 66.9% to 34.9% and the memory write traffic overhead from
17.1% to 3.4% on a variety of workloads.Comment: This paper has been accepted by IEEE Transactions on Parallel and
Distributed System
Pedestal and Er profile evolution during an edge localized mode cycle at ASDEX Upgrade
The upgrade of the edge charge exchange recombination spectroscopy diagnostic at ASDEX
Upgrade has enabled highly spatially resolved me
asurements of the impurity ion dynamics during an
edge-localized mode cycle
(
ELM
)
with unprecedented temp
oral resolution, i.e. 65
μ
s. The increase of
transport during an ELM induces a relaxation of the
ion, electron edge gradients in impurity density
and
fl
ows. Detailed characterization of the recovery
of the edge temperature gradients reveals a
difference in the ion and electron channe
l: the maximum ion temperature gradient
T
i
is
re-established on similar timescales as
n
e
, which is faster than the recovery of
T
e
.Afterthe
clamping of the maximum gradient,
T
i
and
T
e
at the pedestal top continue to rise up to the next ELM
while
n
e
stays constant which means that the temperatur
e pedestal and the resu
lting pedestal pressure
widen until the next ELM. The edge radial electric
fi
eld
E
r
at the ELM crash is found to reduce to
typical L-mode values and its ma
ximum recovers to its pre-ELM conditions on a similar time scale as
for
n
e
and
T
i
. Within the uncertainties, the measurements of
E
r
align with their neoclassical
predictions
E
r,neo
for most of the ELM cycle, thus indicating that
E
r
is dominated by collisional
processes. However, between 2 and 4 ms af
ter the ELM crash, other contributions to
E
B
́
fl
ow,
e.g. zonal
fl
ows or ion orbit effects, could not be
excluded within the uncertainties.European Commission (EUROfusion 633053
Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing
Abstract—Large applications executing on Grid or cluster architectures consisting of hundreds or thousands of computational nodes create problems with respect to reliability. The source of the problems are node failures and the need for dynamic configuration over extensive runtime. This paper presents two fault-tolerance mechanisms called Theft-Induced Checkpointing and Systematic Event Logging. These are transparent protocols capable of overcoming problems associated with both benign faults, i.e., crash faults, and node or subnet volatility. Specifically, the protocols base the state of the execution on a dataflow graph, allowing for efficient recovery in dynamic heterogeneous systems as well as multithreaded applications. By allowing recovery even under different numbers of processors, the approaches are especially suitable for applications with a need for adaptive or reactionary configuration control. The low-cost protocols offer the capability of controlling or bounding the overhead. A formal cost model is presented, followed by an experimental evaluation. It is shown that the overhead of the protocol is very small, and the maximum work lost by a crashed process is small and bounded. Index Terms—Grid computing, rollback recovery, checkpointing, event logging. Ç
A Reliable Instant Messenger in Erlang: Design and Evaluation
This document describes the design and evaluation of two Erlang-based instant messenger systems using Distributed Erlang (D-Erlang) and Scalable Distributed Erlang (SD-Erlang). The purpose of these systems is to serve as real-world benchmarks to test the performance of the SD Erlang library
Recommended from our members
Fault tolerance via diversity for off-the-shelf products: A study with SQL database servers
If an off-the-shelf software product exhibits poor dependability due to design faults, then software fault tolerance is often the only way available to users and system integrators to alleviate the problem. Thanks to low acquisition costs, even using multiple versions of software in a parallel architecture, which is a scheme formerly reserved for few and highly critical applications, may become viable for many applications. We have studied the potential dependability gains from these solutions for off-the-shelf database servers. We based the study on the bug reports available for four off-the-shelf SQL servers plus later releases of two of them. We found that many of these faults cause systematic noncrash failures, which is a category ignored by most studies and standard implementations of fault tolerance for databases. Our observations suggest that diverse redundancy would be effective for tolerating design faults in this category of products. Only in very few cases would demands that triggered a bug in one server cause failures in another one, and there were no coincident failures in more than two of the servers. Use of different releases of the same product would also tolerate a significant fraction of the faults. We report our results and discuss their implications, the architectural options available for exploiting them, and the difficulties that they may present
Implementing Performance Competitive Logical Recovery
New hardware platforms, e.g. cloud, multi-core, etc., have led to a
reconsideration of database system architecture. Our Deuteronomy project
separates transactional functionality from data management functionality,
enabling a flexible response to exploiting new platforms. This separation
requires, however, that recovery is described logically. In this paper, we
extend current recovery methods to work in this logical setting. While this is
straightforward in principle, performance is an issue. We show how ARIES style
recovery optimizations can work for logical recovery where page information is
not captured on the log. In side-by-side performance experiments using a common
log, we compare logical recovery with a state-of-the art ARIES style recovery
implementation and show that logical redo performance can be competitive.Comment: VLDB201
- …