3,440 research outputs found

    TANDEM: taming failures in next-generation datacenters with emerging memory

    Get PDF
    The explosive growth of online services, leading to unforeseen scales, has made modern datacenters highly prone to failures. Taming these failures hinges on fast and correct recovery, minimizing service interruptions. Applications, owing to recovery, entail additional measures to maintain a recoverable state of data and computation logic during their failure-free execution. However, these precautionary measures have severe implications on performance, correctness, and programmability, making recovery incredibly challenging to realize in practice. Emerging memory, particularly non-volatile memory (NVM) and disaggregated memory (DM), offers a promising opportunity to achieve fast recovery with maximum performance. However, incorporating these technologies into datacenter architecture presents significant challenges; Their distinct architectural attributes, differing significantly from traditional memory devices, introduce new semantic challenges for implementing recovery, complicating correctness and programmability. Can emerging memory enable fast, performant, and correct recovery in the datacenter? This thesis aims to answer this question while addressing the associated challenges. When architecting datacenters with emerging memory, system architects face four key challenges: (1) how to guarantee correct semantics; (2) how to efficiently enforce correctness with optimal performance; (3) how to validate end-to-end correctness including recovery; and (4) how to preserve programmer productivity (Programmability). This thesis aims to address these challenges through the following approaches: (a) defining precise consistency models that formally specify correct end-to-end semantics in the presence of failures (consistency models also play a crucial role in programmability); (b) developing new low-level mechanisms to efficiently enforce the prescribed models given the capabilities of emerging memory; and (c) creating robust testing frameworks to validate end-to-end correctness and recovery. We start our exploration with non-volatile memory (NVM), which offers fast persistence capabilities directly accessible through the processor’s load-store (memory) interface. Notably, these capabilities can be leveraged to enable fast recovery for Log-Free Data Structures (LFDs) while maximizing performance. However, due to the complexity of modern cache hierarchies, data hardly persist in any specific order, jeop- ardizing recovery and correctness. Therefore, recovery needs primitives that explicitly control the order of updates to NVM (known as persistency models). We outline the precise specification of a novel persistency model – Release Persistency (RP) – that provides a consistency guarantee for LFDs on what remains in non-volatile memory upon failure. To efficiently enforce RP, we propose a novel microarchitecture mechanism, lazy release persistence (LRP). Using standard LFDs benchmarks, we show that LRP achieves fast recovery while incurring minimal overhead on performance. We continue our discussion with memory disaggregation which decouples memory from traditional monolithic servers, offering a promising pathway for achieving very high availability in replicated in-memory data stores. Achieving such availability hinges on transaction protocols that can efficiently handle recovery in this setting, where compute and memory are independent. However, there is a challenge: disaggregated memory (DM) fails to work with RPC-style protocols, mandating one-sided transaction protocols. Exacerbating the problem, one-sided transactions expose critical low-level ordering to architects, posing a threat to correctness. We present a highly available transaction protocol, Pandora, that is specifically designed to achieve fast recovery in disaggregated key-value stores (DKVSes). Pandora is the first one-sided transactional protocol that ensures correct, non-blocking, and fast recovery in DKVS. Our experimental implementation artifacts demonstrate that Pandora achieves fast recovery and high availability while causing minimal disruption to services. Finally, we introduce a novel target litmus-testing framework – DART – to validate the end-to-end correctness of transactional protocols with recovery. Using DART’s target testing capabilities, we have found several critical bugs in Pandora, highlighting the need for robust end-to-end testing methods in the design loop to iteratively fix correctness bugs. Crucially, DART is lightweight and black-box, thereby eliminating any intervention from the programmers

    ENHANCING CLOUD SYSTEM RUNTIME TO ADDRESS COMPLEX FAILURES

    Get PDF
    As the reliance on cloud systems intensifies in our progressively digital world, understanding and reinforcing their reliability becomes more crucial than ever. Despite impressive advancements in augmenting the resilience of cloud systems, the growing incidence of complex failures now poses a substantial challenge to the availability of these systems. With cloud systems continuing to scale and increase in complexity, failures not only become more elusive to detect but can also lead to more catastrophic consequences. Such failures question the foundational premises of conventional fault-tolerance designs, necessitating the creation of novel system designs to counteract them. This dissertation aims to enhance distributed systems’ capabilities to detect, localize, and react to complex failures at runtime. To this end, this dissertation makes contributions to address three emerging categories of failures in cloud systems. The first part delves into the investigation of partial failures, introducing OmegaGen, a tool adept at generating tailored checkers for detecting and localizing such failures. The second part grapples with silent semantic failures prevalent in cloud systems, showcasing our study findings, and introducing Oathkeeper, a tool that leverages past failures to infer rules and expose these silent issues. The third part explores solutions to slow failures via RESIN, a framework specifically designed to detect, diagnose, and mitigate memory leaks in cloud-scale infrastructures, developed in collaboration with Microsoft Azure. The dissertation concludes by offering insights into future directions for the construction of reliable cloud systems

    A clinical decision support system for detecting and mitigating potentially inappropriate medications

    Get PDF
    Background: Medication errors are a leading cause of preventable harm to patients. In older adults, the impact of ageing on the therapeutic effectiveness and safety of drugs is a significant concern, especially for those over 65. Consequently, certain medications called Potentially Inappropriate Medications (PIMs) can be dangerous in the elderly and should be avoided. Tackling PIMs by health professionals and patients can be time-consuming and error-prone, as the criteria underlying the definition of PIMs are complex and subject to frequent updates. Moreover, the criteria are not available in a representation that health systems can interpret and reason with directly. Objectives: This thesis aims to demonstrate the feasibility of using an ontology/rule-based approach in a clinical knowledge base to identify potentially inappropriate medication(PIM). In addition, how constraint solvers can be used effectively to suggest alternative medications and administration schedules to solve or minimise PIM undesirable side effects. Methodology: To address these objectives, we propose a novel integrated approach using formal rules to represent the PIMs criteria and inference engines to perform the reasoning presented in the context of a Clinical Decision Support System (CDSS). The approach aims to detect, solve, or minimise undesirable side-effects of PIMs through an ontology (knowledge base) and inference engines incorporating multiple reasoning approaches. Contributions: The main contribution lies in the framework to formalise PIMs, including the steps required to define guideline requisites to create inference rules to detect and propose alternative drugs to inappropriate medications. No formalisation of the selected guideline (Beers Criteria) can be found in the literature, and hence, this thesis provides a novel ontology for it. Moreover, our process of minimising undesirable side effects offers a novel approach that enhances and optimises the drug rescheduling process, providing a more accurate way to minimise the effect of drug interactions in clinical practice

    Improving the Specification of Business Application Requirements Based on Executable Models

    Get PDF
    Istraživanje predstavljeno u okviru ove disertacije imalo je za cilj unapređenje procesa specifikacije korisničkih zahteva poslovnih aplikacija na bazi detaljnih, izvršivih prototipova koji se mogu kreirati uz minimalan utrošak vremena i energije. Radi postizanja ovog cilja je implementiran alat otvorenog koda pod nazivom Kroki (fr. croquis – skica) čija je arhitektura projektovana tako da obezbedi: (1) kolaborativni razvoj specifikacije poslovne aplikacije sa korisnicima koji nemaju znanje projektovanja i programiranja softverskih sistema, (2) efikasno pokretanje prototipa direktno iz sopstvenog razvojnog okruženja, dajući mogućnost korisniku da isproba prototip tokom modelovanja kad god poželi, (3) ponovno korišćenje informacija dobijenih prilikom razvoja prototipova u kasnijim fazama razvoja, kako bi se smanjilo nepotrebno trošenje resursa. Eksperiment za proveru da li razvijeni alat zadovoljava postavljene ciljeve je dizajniran kao serija od deset eksplorativnih studija slučaja čiji je cilj specifikacija poslovnih aplikacija sa učesnicima koji dolaze iz različitih poslovnih domena koji nisu poznati projektantima. Pojedinačne studije su obavljene sa po jednim učesnikom u okviru dvočasovnih projektantskih sesija, gde su ulogu projektanata imali autor ove disertacije i njegov mentor. Kvalitativni i kvantitativni podaci prikupljeni tokom sesija i posle njih, putem upitnika, su iskorišćeni za izvođenje pozitivnih zaključaka o efikasnosti predloženog pristupa i alata. Dizajn istraživanja je baziran na konceptima MEM-a (Method Evaluation Model) koji definiše kriterijum za uspeh određene metodologije u praksi. Upitnici koji evaluiraju jezik za modelovanje i Kroki alat su formulisani tako da odgovaraju izabranim konceptima FQUAD (Framework for qualitative assessment of domain-specific languages) okvira za evaluaciju jezika specifičnih za domen.The research presented in this dissertation aimed to improve the process of specification of user requirements of business applications based on detailed, executable prototypes that can be created with minimal expenditure of time and energy. To achieve this goal, an open-source tool called Kroki (fr. croquis - sketch) was implemented, the architecture of which is designed to provide: (1) Collaborative development of business application specifications with users who do not have knowledge of designing and programming software systems, (2) Efficient prototyping directly from Kroki’s development environment, enabling the user to try out the prototype during modeling whenever they want, and (3) Reuse of information obtained during the development of prototypes in later stages of development, to reduce unnecessary consumption of time and energy. The experiment to validate whether the developed Kroki tool meets the set goals was designed as a series of ten exploratory case studies to specify business applications with participants from different business domains unknown to the designers. Individual studies were carried out within two-hour design sessions, where the author of this dissertation and his mentor played the designer role, with a single participant in the user role in each session. Qualitative and quantitative data collected during and after the sessions, through questionnaires, were used to draw positive conclusions about the effectiveness of the proposed approach and tools. The research design is based on the concepts of MEM (Method Evaluation Model), which defines the criteria for the success of a certain methodology in practice. Questionnaires that evaluate the modeling language and the Kroki tool are formulated to correspond to the selected concepts of the FQUAD (Framework for qualitative assessment of domain-specific languages) for evaluating DSLs

    Complete and easy type Inference for first-class polymorphism

    Get PDF
    The Hindley-Milner (HM) typing discipline is remarkable in that it allows statically typing programs without requiring the programmer to annotate programs with types themselves. This is due to the HM system offering complete type inference, meaning that if a program is well typed, the inference algorithm is able to determine all the necessary typing information. Let bindings implicitly perform generalisation, allowing a let-bound variable to receive the most general possible type, which in turn may be instantiated appropriately at each of the variable’s use sites. As a result, the HM type system has since become the foundation for type inference in programming languages such as Haskell as well as the ML family of languages and has been extended in a multitude of ways. The original HM system only supports prenex polymorphism, where type variables are universally quantified only at the outermost level. This precludes many useful programs, such as passing a data structure to a function in the form of a fold function, which would need to be polymorphic in the type of the accumulator. However, this would require a nested quantifier in the type of the overall function. As a result, one direction of extending the HM system is to add support for first-class polymorphism, allowing arbitrarily nested quantifiers and instantiating type variables with polymorphic types. In such systems, restrictions are necessary to retain decidability of type inference. This work presents FreezeML, a novel approach for integrating first-class polymorphism into the HM system, focused on simplicity. It eschews sophisticated yet hard to grasp heuristics in the type systems or extending the language of types, while still requiring only modest amounts of annotations. In particular, FreezeML leverages the mechanisms for generalisation and instantiation that are already at the heart of ML. Generalisation and instantiation are performed by let bindings and variables, respectively, but extended to types beyond prenex polymorphism. The defining feature of FreezeML is the ability to freeze variables, which prevents the usual instantiation of their types, allowing them instead to keep their original, fully polymorphic types. We demonstrate that FreezeML is as expressive as System F by providing a translation from the latter to the former; the reverse direction is also shown. Further, we prove that FreezeML is indeed a conservative extension of ML: When considering only ML programs, FreezeML accepts exactly the same programs as ML itself. # We show that type inference for FreezeML can easily be integrated into HM-like type systems by presenting a sound and complete inference algorithm for FreezeML that extends Algorithm W, the original inference algorithm for the HM system. Since the inception of Algorithm W in the 1970s, type inference for the HM system and its descendants has been modernised by approaches that involve constraint solving, which proved to be more modular and extensible. In such systems, a term is translated to a logical constraint, whose solutions correspond to the types of the original term. A solver for such constraints may then be defined independently. To this end, we demonstrate such a constraint-based inference approach for FreezeML. We also discuss the effects of integrating the value restriction into FreezeML and provide detailed comparisons with other approaches towards first-class polymorphism in ML alongside a collection of examples found in the literature

    Performance Anomalies in Concurrent Data Structure Microbenchmarks

    Get PDF
    Recent decades have witnessed a surge in the development of concurrent data structures with an increasing interest in data structures implementing concurrent sets (CSets). Microbenchmarking tools are frequently utilized to evaluate and compare the performance differences across concurrent data structures. The underlying structure and design of the microbenchmarks themselves can play a hidden but influential role in performance results. However, the impact of microbenchmark design has not been well investigated. In this work, we illustrate instances where concurrent data structure performance results reported by a microbenchmark can vary 10-100x depending on the microbenchmark implementation details. We investigate factors leading to performance variance across three popular microbenchmarks and outline cases in which flawed microbenchmark design can lead to an inversion of performance results between two concurrent data structure implementations. We further derive a set of recommendations for best practices in the design and usage of concurrent data structure microbenchmarks and explore advanced features in the Setbench microbenchmark

    Guided rewriting and constraint satisfaction for parallel GPU code generation

    Get PDF
    Graphics Processing Units (GPUs) are notoriously hard to optimise for manually due to their scheduling and memory hierarchies. What is needed are good automatic code generators and optimisers for such parallel hardware. Functional approaches such as Accelerate, Futhark and LIFT leverage a high-level algorithmic Intermediate Representation (IR) to expose parallelism and abstract the implementation details away from the user. However, producing efficient code for a given accelerator remains challenging. Existing code generators depend on the user input to choose a subset of hard-coded optimizations or automated exploration of implementation search space. The former suffers from the lack of extensibility, while the latter is too costly due to the size of the search space. A hybrid approach is needed, where a space of valid implementations is built automatically and explored with the aid of human expertise. This thesis presents a solution combining user-guided rewriting and automatically generated constraints to produce high-performance code. The first contribution is an automatic tuning technique to find a balance between performance and memory consumption. Leveraging its functional patterns, the LIFT compiler is empowered to infer tuning constraints and limit the search to valid tuning combinations only. Next, the thesis reframes parallelisation as a constraint satisfaction problem. Parallelisation constraints are extracted automatically from the input expression, and a solver is used to identify valid rewriting. The constraints truncate the search space to valid parallel mappings only by capturing the scheduling restrictions of the GPU in the context of a given program. A synchronisation barrier insertion technique is proposed to prevent data races and improve the efficiency of the generated parallel mappings. The final contribution of this thesis is the guided rewriting method, where the user encodes a design space of structural transformations using high-level IR nodes called rewrite points. These strongly typed pragmas express macro rewrites and expose design choices as explorable parameters. The thesis proposes a small set of reusable rewrite points to achieve tiling, cache locality, data reuse and memory optimisation. A comparison with the vendor-provided handwritten kernel ARM Compute Library and the TVM code generator demonstrates the effectiveness of this thesis' contributions. With convolution as a use case, LIFT-generated direct and GEMM-based convolution implementations are shown to perform on par with the state-of-the-art solutions on a mobile GPU. Overall, this thesis demonstrates that a functional IR yields well to user-guided and automatic rewriting for high-performance code generation

    SCALING UP TASK EXECUTION ON RESOURCE-CONSTRAINED SYSTEMS

    Get PDF
    The ubiquity of executing machine learning tasks on embedded systems with constrained resources has made efficient execution of neural networks on these systems under the CPU, memory, and energy constraints increasingly important. Different from high-end computing systems where resources are abundant and reliable, resource-constrained systems only have limited computational capability, limited memory, and limited energy supply. This dissertation focuses on how to take full advantage of the limited resources of these systems in order to improve task execution efficiency from different aspects of the execution pipeline. While the existing literature primarily aims at solving the problem by shrinking the model size according to the resource constraints, this dissertation aims to improve the execution efficiency for a given set of tasks from the following two aspects. Firstly, we propose SmartON, which is the first batteryless active event detection system that considers both the event arrival pattern as well as the harvested energy to determine when the system should wake up and what the duty cycle should be. Secondly, we propose Antler, which exploits the affinity between all pairs of tasks in a multitask inference system to construct a compact graph representation of the task set for a given overall size budget. To achieve the aforementioned algorithmic proposals, we propose the following hardware solutions. One is a controllable capacitor array that can expand the system’s energy storage on-the-fly. The other is a FRAM array that can accommodate multiple neural networks running on one system.Doctor of Philosoph

    "What It Wants Me To Say": Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models

    Full text link
    Code-generating large language models translate natural language into code. However, only a small portion of the infinite space of naturalistic utterances is effective at guiding code generation. For non-expert end-user programmers, learning this is the challenge of abstraction matching. We examine this challenge in the specific context of data analysis in spreadsheets, in a system that maps the users natural language query to Python code using the Codex generator, executes the code, and shows the result. We propose grounded abstraction matching, which bridges the abstraction gap by translating the code back into a systematic and predictable naturalistic utterance. In a between-subjects, think-aloud study (n=24), we compare grounded abstraction matching to an ungrounded alternative based on previously established query framing principles. We find that the grounded approach improves end-users' understanding of the scope and capabilities of the code-generating model, and the kind of language needed to use it effectively
    corecore