34 research outputs found
LNCS
Concurrent accesses to shared data structures must be synchronized to avoid data races. Coarse-grained synchronization, which locks the entire data structure, is easy to implement but does not scale. Fine-grained synchronization can scale well, but can be hard to reason about. Hand-over-hand locking, in which operations are pipelined as they traverse the data structure, combines fine-grained synchronization with ease of use. However, the traditional implementation suffers from inherent overheads. This paper introduces snapshot-based synchronization (SBS), a novel hand-over-hand locking mechanism. SBS decouples the synchronization state from the data, significantly improving cache utilization. Further, it relies on guarantees provided by pipelining to minimize synchronization that requires cross-thread communication. Snapshot-based synchronization thus scales much better than traditional hand-over-hand locking, while maintaining the same ease of use
Efficient Symmetry Reduction and the Use of State Symmetries for Symbolic Model Checking
One technique to reduce the state-space explosion problem in temporal logic
model checking is symmetry reduction. The combination of symmetry reduction and
symbolic model checking by using BDDs suffered a long time from the
prohibitively large BDD for the orbit relation. Dynamic symmetry reduction
calculates representatives of equivalence classes of states dynamically and
thus avoids the construction of the orbit relation. In this paper, we present a
new efficient model checking algorithm based on dynamic symmetry reduction. Our
experiments show that the algorithm is very fast and allows the verification of
larger systems. We additionally implemented the use of state symmetries for
symbolic symmetry reduction. To our knowledge we are the first who investigated
state symmetries in combination with BDD based symbolic model checking
Disrupted nitric oxide signaling due to GUCY1A3 mutations increases risk for moyamoya disease, achalasia and hypertension
Moyamoya disease (MMD) is a progressive vasculopathy characterized by occlusion of the terminal portion of the internal carotid arteries and its branches, and the formation of compensatory moyamoya collateral vessels. Homozygous mutations in GUCY1A3 have been reported as a cause of MMD and achalasia. Probands (n = 96) from unrelated families underwent sequencing of GUCY1A3. Functional studies were performed to confirm the pathogenicity of identified GUCY1A3 variants. Two affected individuals from the unrelated families were found to have compound heterozygous mutations in GUCY1A3. MM041 was diagnosed with achalasia at 4 years of age, hypertension and MMD at 18 years of age. MM149 was diagnosed with MMD and hypertension at the age of 20 months. Both individuals carry one allele that is predicted to lead to haploinsufficiency and a second allele that is predicted to produce a mutated protein. Biochemical studies of one of these alleles, GUCY1A3 Cys517Tyr, showed that the mutant protein (a subunit of soluble guanylate cyclase) has a significantly blunted signaling response with exposure to nitric oxide (NO). GUCY1A3 missense and haploinsufficiency mutations disrupt NO signaling leading to MMD and hypertension, with or without achalasia
Active memory controller
Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs\u27 performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50x faster barriers, 12x faster spinlocks, 8.5x-15x faster stream/array operations, and 3x faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation
A Comparison of the M-PCP, D-PCP, and FMLP on LITMUS^RT
This paper presents a performance comparison of three multiprocessor real-time locking protocols: the multiprocessor priority ceiling protocol (M-PCP), the distributed priority ceiling protocol (D-PCP), and the flexible multiprocessor locking protocol (FMLP). In the FMLP, blocking is implemented via either suspending or spinning, while in the M-PCP and D-PCP, all blocking is by suspending. The presented comparison was conducted using a UNC-produced Linux extension called LITMUS RT. In this comparison, schedulability experiments were conducted in which runtime overheads as measured on LITMUS RT were used. In these experiments, the spin-based FMLP variant always exhibited the best performance, and the M-PCP and D-PCP almost always exhibited poor performance. These results call into question the practical viability of the M-PCP and D-PCP, which have been the de-facto standard for real-time multiprocessor locking for the last 20 years
Low-Overhead, High-Speed Multi-core Barrier Synchronization
Abstract. Whereas efcient barrier implementations were once a concern only in high-performance computing, recent trends in core integration make the topic relevant even for general-purpose CMPs. While the nature of CMP applications requires low-latency, the cost of low-latency barrier implementations using hardware-based techniques can be prohibitive for CMPs, where die area represents oppor-tunities for throughput and yield. Similarly, whereas traditional multiprocessor barrier implementations were developed primarily for dedicated environments, scheduling and multi-programming on CMPs require more adaptable barrier im-plementations. In this paper, we present and evaluate three barrier implementations that are hy-brids of software and dedicated hardware barriers and are specically tailored for CMPs. The implementations leverage the unique characteristics of CMPs and provide low latency comparable to that of dedicated hardware networks at a frac-tion of the cost. The implementations also support adaptability, enabling efcient multi-programming and dynamic remapping of the barrier network.
A Practical Multi-Word Compare-and-Swap Operation
Work on non-blocking data structures has proposed extending processor designs with a compare-and-swap primitive, CAS2, which acts on two arbitrary memory locations. Experience suggested that current operations, typically single-word compare-and-swap (CAS1), are not expressive enough to be used alone in an efficient manner. In this paper we build CAS2 from CAS1 and, in fact, build an arbitrary multi-word compare-and-swap (CASN). Our design requires only the primitives available on contemporary systems, reserves a small and constant amount of space in each word updated (either 0 or 2 bits) and permits nonoverlapping updates to occur concurrently. This provides compelling evidence that current primitives are not only universal in the theoretical sense introduced by Herlihy, but are also universal in their use as foundations for practical algorithms. This provides a straightforward mechanism for deploying many of the interesting non-blocking data structures presented in the literature that have previously required CAS2