665 research outputs found

    TANDEM: taming failures in next-generation datacenters with emerging memory

    Get PDF
    The explosive growth of online services, leading to unforeseen scales, has made modern datacenters highly prone to failures. Taming these failures hinges on fast and correct recovery, minimizing service interruptions. Applications, owing to recovery, entail additional measures to maintain a recoverable state of data and computation logic during their failure-free execution. However, these precautionary measures have severe implications on performance, correctness, and programmability, making recovery incredibly challenging to realize in practice. Emerging memory, particularly non-volatile memory (NVM) and disaggregated memory (DM), offers a promising opportunity to achieve fast recovery with maximum performance. However, incorporating these technologies into datacenter architecture presents significant challenges; Their distinct architectural attributes, differing significantly from traditional memory devices, introduce new semantic challenges for implementing recovery, complicating correctness and programmability. Can emerging memory enable fast, performant, and correct recovery in the datacenter? This thesis aims to answer this question while addressing the associated challenges. When architecting datacenters with emerging memory, system architects face four key challenges: (1) how to guarantee correct semantics; (2) how to efficiently enforce correctness with optimal performance; (3) how to validate end-to-end correctness including recovery; and (4) how to preserve programmer productivity (Programmability). This thesis aims to address these challenges through the following approaches: (a) defining precise consistency models that formally specify correct end-to-end semantics in the presence of failures (consistency models also play a crucial role in programmability); (b) developing new low-level mechanisms to efficiently enforce the prescribed models given the capabilities of emerging memory; and (c) creating robust testing frameworks to validate end-to-end correctness and recovery. We start our exploration with non-volatile memory (NVM), which offers fast persistence capabilities directly accessible through the processor’s load-store (memory) interface. Notably, these capabilities can be leveraged to enable fast recovery for Log-Free Data Structures (LFDs) while maximizing performance. However, due to the complexity of modern cache hierarchies, data hardly persist in any specific order, jeop- ardizing recovery and correctness. Therefore, recovery needs primitives that explicitly control the order of updates to NVM (known as persistency models). We outline the precise specification of a novel persistency model – Release Persistency (RP) – that provides a consistency guarantee for LFDs on what remains in non-volatile memory upon failure. To efficiently enforce RP, we propose a novel microarchitecture mechanism, lazy release persistence (LRP). Using standard LFDs benchmarks, we show that LRP achieves fast recovery while incurring minimal overhead on performance. We continue our discussion with memory disaggregation which decouples memory from traditional monolithic servers, offering a promising pathway for achieving very high availability in replicated in-memory data stores. Achieving such availability hinges on transaction protocols that can efficiently handle recovery in this setting, where compute and memory are independent. However, there is a challenge: disaggregated memory (DM) fails to work with RPC-style protocols, mandating one-sided transaction protocols. Exacerbating the problem, one-sided transactions expose critical low-level ordering to architects, posing a threat to correctness. We present a highly available transaction protocol, Pandora, that is specifically designed to achieve fast recovery in disaggregated key-value stores (DKVSes). Pandora is the first one-sided transactional protocol that ensures correct, non-blocking, and fast recovery in DKVS. Our experimental implementation artifacts demonstrate that Pandora achieves fast recovery and high availability while causing minimal disruption to services. Finally, we introduce a novel target litmus-testing framework – DART – to validate the end-to-end correctness of transactional protocols with recovery. Using DART’s target testing capabilities, we have found several critical bugs in Pandora, highlighting the need for robust end-to-end testing methods in the design loop to iteratively fix correctness bugs. Crucially, DART is lightweight and black-box, thereby eliminating any intervention from the programmers

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Current and Future Challenges in Knowledge Representation and Reasoning

    Full text link
    Knowledge Representation and Reasoning is a central, longstanding, and active area of Artificial Intelligence. Over the years it has evolved significantly; more recently it has been challenged and complemented by research in areas such as machine learning and reasoning under uncertainty. In July 2022 a Dagstuhl Perspectives workshop was held on Knowledge Representation and Reasoning. The goal of the workshop was to describe the state of the art in the field, including its relation with other areas, its shortcomings and strengths, together with recommendations for future progress. We developed this manifesto based on the presentations, panels, working groups, and discussions that took place at the Dagstuhl Workshop. It is a declaration of our views on Knowledge Representation: its origins, goals, milestones, and current foci; its relation to other disciplines, especially to Artificial Intelligence; and on its challenges, along with key priorities for the next decade

    Robust and Listening-Efficient Contention Resolution

    Full text link
    This paper shows how to achieve contention resolution on a shared communication channel using only a small number of channel accesses -- both for listening and sending -- and the resulting algorithm is resistant to adversarial noise. The shared channel operates over a sequence of synchronized time slots, and in any slot agents may attempt to broadcast a packet. An agent's broadcast succeeds if no other agent broadcasts during that slot. If two or more agents broadcast in the same slot, then the broadcasts collide and both broadcasts fail. An agent listening on the channel during a slot receives ternary feedback, learning whether that slot had silence, a successful broadcast, or a collision. Agents are (adversarially) injected into the system over time. The goal is to coordinate the agents so that each is able to successfully broadcast its packet. A contention-resolution protocol is measured both in terms of its throughput and the number of slots during which an agent broadcasts or listens. Most prior work assumes that listening is free and only tries to minimize the number of broadcasts. This paper answers two foundational questions. First, is constant throughput achievable when using polylogarithmic channel accesses per agent, both for listening and broadcasting? Second, is constant throughput still achievable when an adversary jams some slots by broadcasting noise in them? Specifically, for NN packets arriving over time and JJ jammed slots, we give an algorithm that with high probability in N+JN+J guarantees Θ(1)\Theta(1) throughput and achieves on average O(polylog(N+J))O(\texttt{polylog}(N+J)) channel accesses against an adaptive adversary. We also have per-agent high-probability guarantees on the number of channel accesses -- either O(polylog(N+J))O(\texttt{polylog}(N+J)) or O((J+1)polylog(N))O((J+1) \texttt{polylog}(N)), depending on how quickly the adversary can react to what is being broadcast

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Authentic alignment : toward an Interpretative Phenomenological Analysis (IPA) informed model of the learning environment in health professions education

    Get PDF
    It is well established that the goals of education can only be achieved through the constructive alignment of instruction, learning and assessment. There is a gap in research interpreting the lived experiences of stakeholders within the UK learning environment toward understanding the real impact – authenticity – of curricular alignment. This investigation uses a critical realist framework to explore the emergent quality of authenticity as a function of alignment.This project deals broadly with alignment of anatomy pedagogy within UK undergraduate medical education. The thread of alignment is woven through four aims: 1) to understand the alignment of anatomy within the medical curriculum via the relationships of its stakeholders; 2) to explore the apparent complexity of the learning environment (LE); 3) to generate a critical evaluation of the methodology, Interpretative Phenomenological Analysis as an approach appropriate for realist research in the complex fields of medical and health professions education; 4) to propose a functional, authentic model of the learning environment.Findings indicate that the complexity and uncertainty inherent in the LE can be reflected in spatiotemporal models. Findings meet the thesis aims, suggesting: 1) the alignment of anatomy within the medical curriculum is complex and forms a multiplicity of perspectives; 2) this complexity is ripe for phenomenological exploration; 3) IPA is particularly suitable for realist research exploring complexity in HPE; 4) Authentic Alignment theory offers a spatiotemporal model of the complex HPE learning environment:the T-icosa

    Process-Algebraic Models of Multi-Writer Multi-Reader Non-Atomic Registers

    Get PDF
    We present process-algebraic models of multi-writer multi-reader safe, regular and atomic registers. We establish the relationship between our models and alternative versions presented in the literature. We use our models to formally analyse by model checking to what extent several well-known mutual exclusion algorithms are robust for relaxed atomicity requirements. Our analyses refute correctness claims made about some of these algorithms in the literature
    • …
    corecore