14,366 research outputs found
Energy-efficient and high-performance lock speculation hardware for embedded multicore systems
Embedded systems are becoming increasingly common in everyday life and like their general-purpose counterparts, they have shifted towards shared memory multicore architectures. However, they are much more resource constrained, and as they often run on batteries, energy efficiency becomes critically important. In such systems, achieving high concurrency is a key demand for delivering satisfactory performance at low energy cost. In order to achieve this high concurrency, consistency across the shared memory hierarchy must be accomplished in a cost-effective manner in terms of performance, energy, and implementation complexity. In this article, we propose Embedded-Spec, a hardware solution for supporting transparent lock speculation, without the requirement for special supporting instructions. Using this approach, we evaluate the energy consumption and performance of a suite of benchmarks, exploring a range of contention management and retry policies. We conclude that for resource-constrained platforms, lock speculation can provide real benefits in terms of improved concurrency and energy efficiency, as long as the underlying hardware support is carefully configured.This work is supported in part by NSF under Grants CCF-0903384, CCF-0903295, CNS-1319495, and CNS-1319095 as well the Semiconductor Research Corporation under grant number 1983.001. (CCF-0903384 - NSF; CCF-0903295 - NSF; CNS-1319495 - NSF; CNS-1319095 - NSF; 1983.001 - Semiconductor Research Corporation
Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency
Persistent memory provides high-performance data persistence at main memory.
Memory writes need to be performed in strict order to satisfy storage
consistency requirements and enable correct recovery from system crashes.
Unfortunately, adhering to such a strict order significantly degrades system
performance and persistent memory endurance. This paper introduces a new
mechanism, Loose-Ordering Consistency (LOC), that satisfies the ordering
requirements at significantly lower performance and endurance loss. LOC
consists of two key techniques. First, Eager Commit eliminates the need to
perform a persistent commit record write within a transaction. We do so by
ensuring that we can determine the status of all committed transactions during
recovery by storing necessary metadata information statically with blocks of
data written to memory. Second, Speculative Persistence relaxes the write
ordering between transactions by allowing writes to be speculatively written to
persistent memory. A speculative write is made visible to software only after
its associated transaction commits. To enable this, our mechanism supports the
tracking of committed transaction ID and multi-versioning in the CPU cache. Our
evaluations show that LOC reduces the average performance overhead of memory
persistence from 66.9% to 34.9% and the memory write traffic overhead from
17.1% to 3.4% on a variety of workloads.Comment: This paper has been accepted by IEEE Transactions on Parallel and
Distributed System
Incremental Consistency Guarantees for Replicated Objects
Programming with replicated objects is difficult. Developers must face the
fundamental trade-off between consistency and performance head on, while
struggling with the complexity of distributed storage stacks. We introduce
Correctables, a novel abstraction that hides most of this complexity, allowing
developers to focus on the task of balancing consistency and performance. To
aid developers with this task, Correctables provide incremental consistency
guarantees, which capture successive refinements on the result of an ongoing
operation on a replicated object. In short, applications receive both a
preliminary---fast, possibly inconsistent---result, as well as a
final---consistent---result that arrives later.
We show how to leverage incremental consistency guarantees by speculating on
preliminary values, trading throughput and bandwidth for improved latency. We
experiment with two popular storage systems (Cassandra and ZooKeeper) and three
applications: a Twissandra-based microblogging service, an ad serving system,
and a ticket selling system. Our evaluation on the Amazon EC2 platform with
YCSB workloads A, B, and C shows that we can reduce the latency of strongly
consistent operations by up to 40% (from 100ms to 60ms) at little cost (10%
bandwidth increase, 6% throughput drop) in the ad system. Even if the
preliminary result is frequently inconsistent (25% of accesses), incremental
consistency incurs a bandwidth overhead of only 27%.Comment: 16 total pages, 12 figures. OSDI'16 (to appear
Thread partitioning and value prediction for exploiting speculative thread-level parallelism
Speculative thread-level parallelism has been recently proposed as a source of parallelism to improve the performance in applications where parallel threads are hard to find. However, the efficiency of this execution model strongly depends on the performance of the control and data speculation techniques. Several hardware-based schemes for partitioning the program into speculative threads are analyzed and evaluated. In general, we find that spawning threads associated to loop iterations is the most effective technique. We also show that value prediction is critical for the performance of all of the spawning policies. Thus, a new value predictor, the increment predictor, is proposed. This predictor is specially oriented for this kind of architecture and clearly outperforms the adapted versions of conventional value predictors such as the last value, the stride, and the context-based, especially for small-sized history tables.Peer ReviewedPostprint (published version
- …