665 research outputs found
TANDEM: taming failures in next-generation datacenters with emerging memory
The explosive growth of online services, leading to unforeseen scales, has made modern datacenters highly prone to failures. Taming these failures hinges on fast and correct recovery, minimizing service interruptions.
Applications, owing to recovery, entail additional measures to maintain a recoverable state of data and computation logic during their failure-free execution. However, these precautionary measures have
severe implications on performance, correctness, and programmability, making recovery incredibly challenging to realize in practice.
Emerging memory, particularly non-volatile memory (NVM) and disaggregated memory (DM), offers a promising opportunity to achieve fast recovery with maximum performance. However, incorporating these technologies into datacenter architecture presents significant challenges; Their distinct architectural attributes, differing significantly from traditional memory devices, introduce new semantic challenges for
implementing recovery, complicating correctness and programmability.
Can emerging memory enable fast, performant, and correct recovery in the datacenter? This thesis aims to answer this question while addressing the associated challenges.
When architecting datacenters with emerging memory, system architects face four key challenges: (1) how to guarantee correct semantics; (2) how to efficiently enforce correctness with optimal performance; (3) how to validate end-to-end correctness including recovery; and (4) how to preserve programmer productivity (Programmability).
This thesis aims to address these challenges through the following approaches: (a)
defining precise consistency models that formally specify correct end-to-end semantics
in the presence of failures (consistency models also play a crucial role in programmability); (b) developing new low-level mechanisms to efficiently enforce the prescribed models given the capabilities of emerging memory; and (c) creating robust testing frameworks to validate end-to-end correctness and recovery.
We start our exploration with non-volatile memory (NVM), which offers fast persistence capabilities directly accessible through the processor’s load-store (memory) interface. Notably, these capabilities can be leveraged to enable fast recovery for Log-Free Data Structures (LFDs) while maximizing performance. However, due to the complexity of modern cache hierarchies, data hardly persist in any specific order, jeop-
ardizing recovery and correctness. Therefore, recovery needs primitives that explicitly control the order of updates to NVM (known as persistency models). We outline the precise specification of a novel persistency model – Release Persistency (RP) – that provides a consistency guarantee for LFDs on what remains in non-volatile memory upon failure. To efficiently enforce RP, we propose a novel microarchitecture mechanism,
lazy release persistence (LRP). Using standard LFDs benchmarks, we show that LRP achieves fast recovery while incurring minimal overhead on performance.
We continue our discussion with memory disaggregation which decouples memory from traditional monolithic servers, offering a promising pathway for achieving very high availability in replicated in-memory data stores. Achieving such availability hinges on transaction protocols that can efficiently handle recovery in this setting, where
compute and memory are independent. However, there is a challenge: disaggregated memory (DM) fails to work with RPC-style protocols, mandating one-sided transaction protocols. Exacerbating the problem, one-sided transactions expose critical low-level
ordering to architects, posing a threat to correctness. We present a highly available transaction protocol, Pandora, that is specifically designed to achieve fast recovery in disaggregated key-value stores (DKVSes).
Pandora is the first one-sided transactional protocol that ensures correct, non-blocking, and fast recovery in DKVS. Our experimental implementation artifacts demonstrate that Pandora achieves fast recovery and high availability while causing minimal disruption to services.
Finally, we introduce a novel target litmus-testing framework – DART – to validate the end-to-end correctness of transactional protocols with recovery. Using DART’s target testing capabilities, we have found several critical bugs in Pandora, highlighting the need for robust end-to-end testing methods in the design loop to iteratively fix correctness bugs. Crucially, DART is lightweight and black-box, thereby eliminating
any intervention from the programmers
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Current and Future Challenges in Knowledge Representation and Reasoning
Knowledge Representation and Reasoning is a central, longstanding, and active
area of Artificial Intelligence. Over the years it has evolved significantly;
more recently it has been challenged and complemented by research in areas such
as machine learning and reasoning under uncertainty. In July 2022 a Dagstuhl
Perspectives workshop was held on Knowledge Representation and Reasoning. The
goal of the workshop was to describe the state of the art in the field,
including its relation with other areas, its shortcomings and strengths,
together with recommendations for future progress. We developed this manifesto
based on the presentations, panels, working groups, and discussions that took
place at the Dagstuhl Workshop. It is a declaration of our views on Knowledge
Representation: its origins, goals, milestones, and current foci; its relation
to other disciplines, especially to Artificial Intelligence; and on its
challenges, along with key priorities for the next decade
Robust and Listening-Efficient Contention Resolution
This paper shows how to achieve contention resolution on a shared
communication channel using only a small number of channel accesses -- both for
listening and sending -- and the resulting algorithm is resistant to
adversarial noise.
The shared channel operates over a sequence of synchronized time slots, and
in any slot agents may attempt to broadcast a packet. An agent's broadcast
succeeds if no other agent broadcasts during that slot. If two or more agents
broadcast in the same slot, then the broadcasts collide and both broadcasts
fail. An agent listening on the channel during a slot receives ternary
feedback, learning whether that slot had silence, a successful broadcast, or a
collision. Agents are (adversarially) injected into the system over time. The
goal is to coordinate the agents so that each is able to successfully broadcast
its packet.
A contention-resolution protocol is measured both in terms of its throughput
and the number of slots during which an agent broadcasts or listens. Most prior
work assumes that listening is free and only tries to minimize the number of
broadcasts.
This paper answers two foundational questions. First, is constant throughput
achievable when using polylogarithmic channel accesses per agent, both for
listening and broadcasting? Second, is constant throughput still achievable
when an adversary jams some slots by broadcasting noise in them? Specifically,
for packets arriving over time and jammed slots, we give an algorithm
that with high probability in guarantees throughput and
achieves on average channel accesses against an
adaptive adversary. We also have per-agent high-probability guarantees on the
number of channel accesses -- either or , depending on how quickly the adversary can react to what
is being broadcast
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Authentic alignment : toward an Interpretative Phenomenological Analysis (IPA) informed model of the learning environment in health professions education
It is well established that the goals of education can only be achieved through the constructive alignment of instruction, learning and assessment. There is a gap in research interpreting the lived experiences of stakeholders within the UK learning environment toward understanding the real impact – authenticity – of curricular alignment. This investigation uses a critical realist framework to explore the emergent quality of authenticity as a function of alignment.This project deals broadly with alignment of anatomy pedagogy within UK undergraduate medical education. The thread of alignment is woven through four aims: 1) to understand the alignment of anatomy within the medical curriculum via the relationships of its stakeholders; 2) to explore the apparent complexity of the learning environment (LE); 3) to generate a critical evaluation of the methodology, Interpretative Phenomenological Analysis as an approach appropriate for realist research in the complex fields of medical and health professions education; 4) to propose a functional, authentic model of the learning environment.Findings indicate that the complexity and uncertainty inherent in the LE can be reflected in spatiotemporal models. Findings meet the thesis aims, suggesting: 1) the alignment of anatomy within the medical curriculum is complex and forms a multiplicity of perspectives; 2) this complexity is ripe for phenomenological exploration; 3) IPA is particularly suitable for realist research exploring complexity in HPE; 4) Authentic Alignment theory offers a spatiotemporal model of the complex HPE learning environment:the T-icosa
Recommended from our members
Proceedings of the 33rd Annual Workshop of the Psychology of Programming Interest Group
This is the Proceedings of the 33rd Annual Workshop of the Psychology of Programming Interest Group (PPIG). This was the first PPIG to be held physically since 2019, following the two online-only PPIGs in 2020 and 2021, both during the Covid pandemic. It was also the first PPIG conference to be designed specifically for hybrid attendance. Reflecting the theme, it was hosted by Music Computing Lab at the Open University in Milton Keynes
Process-Algebraic Models of Multi-Writer Multi-Reader Non-Atomic Registers
We present process-algebraic models of multi-writer multi-reader safe, regular and atomic registers. We establish the relationship between our models and alternative versions presented in the literature. We use our models to formally analyse by model checking to what extent several well-known mutual exclusion algorithms are robust for relaxed atomicity requirements. Our analyses refute correctness claims made about some of these algorithms in the literature
- …