439 research outputs found

    TANDEM: taming failures in next-generation datacenters with emerging memory

    Get PDF
    The explosive growth of online services, leading to unforeseen scales, has made modern datacenters highly prone to failures. Taming these failures hinges on fast and correct recovery, minimizing service interruptions. Applications, owing to recovery, entail additional measures to maintain a recoverable state of data and computation logic during their failure-free execution. However, these precautionary measures have severe implications on performance, correctness, and programmability, making recovery incredibly challenging to realize in practice. Emerging memory, particularly non-volatile memory (NVM) and disaggregated memory (DM), offers a promising opportunity to achieve fast recovery with maximum performance. However, incorporating these technologies into datacenter architecture presents significant challenges; Their distinct architectural attributes, differing significantly from traditional memory devices, introduce new semantic challenges for implementing recovery, complicating correctness and programmability. Can emerging memory enable fast, performant, and correct recovery in the datacenter? This thesis aims to answer this question while addressing the associated challenges. When architecting datacenters with emerging memory, system architects face four key challenges: (1) how to guarantee correct semantics; (2) how to efficiently enforce correctness with optimal performance; (3) how to validate end-to-end correctness including recovery; and (4) how to preserve programmer productivity (Programmability). This thesis aims to address these challenges through the following approaches: (a) defining precise consistency models that formally specify correct end-to-end semantics in the presence of failures (consistency models also play a crucial role in programmability); (b) developing new low-level mechanisms to efficiently enforce the prescribed models given the capabilities of emerging memory; and (c) creating robust testing frameworks to validate end-to-end correctness and recovery. We start our exploration with non-volatile memory (NVM), which offers fast persistence capabilities directly accessible through the processor’s load-store (memory) interface. Notably, these capabilities can be leveraged to enable fast recovery for Log-Free Data Structures (LFDs) while maximizing performance. However, due to the complexity of modern cache hierarchies, data hardly persist in any specific order, jeop- ardizing recovery and correctness. Therefore, recovery needs primitives that explicitly control the order of updates to NVM (known as persistency models). We outline the precise specification of a novel persistency model – Release Persistency (RP) – that provides a consistency guarantee for LFDs on what remains in non-volatile memory upon failure. To efficiently enforce RP, we propose a novel microarchitecture mechanism, lazy release persistence (LRP). Using standard LFDs benchmarks, we show that LRP achieves fast recovery while incurring minimal overhead on performance. We continue our discussion with memory disaggregation which decouples memory from traditional monolithic servers, offering a promising pathway for achieving very high availability in replicated in-memory data stores. Achieving such availability hinges on transaction protocols that can efficiently handle recovery in this setting, where compute and memory are independent. However, there is a challenge: disaggregated memory (DM) fails to work with RPC-style protocols, mandating one-sided transaction protocols. Exacerbating the problem, one-sided transactions expose critical low-level ordering to architects, posing a threat to correctness. We present a highly available transaction protocol, Pandora, that is specifically designed to achieve fast recovery in disaggregated key-value stores (DKVSes). Pandora is the first one-sided transactional protocol that ensures correct, non-blocking, and fast recovery in DKVS. Our experimental implementation artifacts demonstrate that Pandora achieves fast recovery and high availability while causing minimal disruption to services. Finally, we introduce a novel target litmus-testing framework – DART – to validate the end-to-end correctness of transactional protocols with recovery. Using DART’s target testing capabilities, we have found several critical bugs in Pandora, highlighting the need for robust end-to-end testing methods in the design loop to iteratively fix correctness bugs. Crucially, DART is lightweight and black-box, thereby eliminating any intervention from the programmers

    Objcache: An Elastic Filesystem over External Persistent Storage for Container Clusters

    Full text link
    Container virtualization enables emerging AI workloads such as model serving, highly parallelized training, machine learning pipelines, and so on, to be easily scaled on demand on the elastic cloud infrastructure. Particularly, AI workloads require persistent storage to store data such as training inputs, models, and checkpoints. An external storage system like cloud object storage is a common choice because of its elasticity and scalability. To mitigate access latency to external storage, caching at a local filesystem is an essential technique. However, building local caches on scaling clusters must cope with explosive disk usage, redundant networking, and unexpected failures. We propose objcache, an elastic filesystem over external storage. Objcache introduces an internal transaction protocol over Raft logging to enable atomic updates of distributed persistent states with consistent hashing. The proposed transaction protocol can also manage inode dirtiness by maintaining the consistency between the local cache and external storage. Objcache supports scaling down to zero by automatically evicting dirty files to external storage. Our evaluation reports that objcache speeded up model serving startup by 98.9% compared to direct copies via S3 interfaces. Scaling up with dirty files completed from 2 to 14 seconds with 1024 dirty files.Comment: 13 page

    Deep Learning Methods for Estimation of Elasticity and Backscatter Quantitative Ultrasound

    Get PDF
    Ultrasound (US) imaging is increasingly attracting the attention of both academic and industrial researchers due to being a real-time and nonionizing imaging modality. It is also less expensive and more portable compared to other medical imaging techniques. However, the granular appearance hinders the interpretation of US images, hindering its wider adoption. This granular appearance (also referred to as speckles) arises from the backscattered echo from microstructural components smaller than the ultrasound wavelength, which are called scatterers. While significant effort has been undertaken to reduce the appearance of speckles, they contain scatterer properties that are highly correlated with the microstructure of the tissue that can be employed to diagnose different types of disease. There are many properties that can be extracted from speckles that are clinically valuable, such as the elasticity and organization of scatterers. Analyzing the motion of scatterers in the presence of an internal or external force can be used to obtain the elastic properties of the tissue. The technique is called elastography and has been widely used to characterize the tissue. Estimating the scatterer organization (scatterer number density and coherent to diffuse scattering power) is also crucial as it provides information about tissue microstructure and potentially aids in disease diagnosis and treatment monitoring. This thesis proposes several deep learning-based methods to facilitate and improve the estimation of speckle motion and scatterer properties, potentially simplifying the interpretation of US images. In particular, we propose new methods for displacement estimation in Chapters 2 to 6 and introduce novel techniques in Chapters 7 to 11 to quantify scatterers’ number density and organization

    Optimistic Adaptation of Decentralised Role-based Software Systems

    Get PDF
    The complexity of computer networks has been rising over the last decades. Increasing interconnectivity between multiple devices, growing complexity of performed tasks and a strong collaboration between nodes are drivers for this phenomenon. An example is represented by Internet-of-Things devices, whose relevance has been rising in recent years. The increasing number of devices requiring updates and supervision makes maintenance more difficult. Human interaction, in this case, is costly and requires a lot of time. To overcome this, self-adaptive software systems (SAS) can be used. SAS are a subset of autonomous systems which can monitor themselves and their environment to adapt to changes without human interaction. In the literature, different approaches for engineering SAS were proposed, including techniques for executing adaptations on multiple devices based on generated plans for reacting to changes. Among those solutions, also decentralised approaches can be found. To the best of our knowledge, no approach for engineering a SAS exists which tolerates errors during the execution of adaptation in a decentralised setting. While some approaches for role-based execution reset the application in case of a single failure during the adaptation process, others do not make assumptions about errors or do not consider an erroneous environment. In a real-world environment, errors will likely occur during run-time, and the adaptation process could be disturbed. This work aims to perform adaptations in a decentralised way on role-based systems with a relaxed consistency constraint, i.e., errors during the adaptation phase are tolerated. This increases the availability of nodes since no rollbacks are required in case of a failure. Moreover, a subset of applications, such as drone swarms, would benefit from an approach with a relaxed consistency model since parts of the system that adapted successfully can already operate in an adapted configuration instead of waiting for other peers to apply the changes in a later iteration. Moreover, if we eliminate the need for an atomic adaptation execution, asynchronous execution of adaptation would be possible. In that case, we can supervise the adaptation process for a long time and ensure that every peer takes the planned actions as soon as the internal task execution allows it. To allow for a relaxed consistent way of adaptation execution, we develop a decentralised adaptation execution protocol, which supports the notion of eventual consistency. As soon as devices reconnect after network congestion or restore their internal state after local failures, our protocol can coordinate the recovery process among multiple devices to attempt recovery of a globally consistent state after errors occur. By superseding the need for a central instance, every peer who received information about failing peers can start the recovery process. The developed approach can restore a consistent global configuration if almost all peers fail. Moreover, the approach supports asynchronous adaptations, i.e., the peers can execute planned adaptations as soon as they are ready, which increases overall availability in case of delayed adaptation of single nodes. The developed protocol is evaluated with the help of a proof-of-concept implementation. The approach was run in five different experiments with thousands of iterations to show the applicability and reliability of this novel approach. The time for execution of the protocol and the number of exchanged messages has been measured to compare the protocol for different error cases and system sizes, as well as to show the scalability of the approach. The developed solution has been compared to a blocking approach to show the feasibility compared to an atomic approach. The applicability in a real-world scenario has been described in an empirical study using an example of a fire-extinguishing drone swarm. The results show that an optimistic approach to adaptation is suitable and specific scenarios can benefit from the improved availability since no rollbacks are required. Systems can continue their work regardless of the failures of participating nodes in large-scale systems.:Abstract VI 1. Introduction 1 1.1. Motivational Use-Case 2 1.2. Problem Definition 3 1.3. Objectives 4 1.4. Research Questions 5 1.5. Contributions 5 1.6. Outline 6 2. Foundation 7 2.1. Role Concept 7 2.2. Self-Adaptive Software Systems 13 2.3. Terminology for Role-Based Self-Adaptation 15 2.4. Consistency Preservation and Consistency Models 17 2.5. Summary 20 3. Related Work 21 3.1. Role-Based Approaches 22 3.2. Actor Model of Computation and Akka 23 3.3. Adaptation Execution in Self-Adaptive Software Systems 24 3.4. Change Consistency in Distributed Systems 33 3.5. Comparison of the Evaluated Approaches 40 4. The Decentralised Consistency Compensation Protocol 43 4.1. System and Error Model 43 4.2. Requirements to the Concept 44 4.3. The Usage of Roles in Adaptations 45 4.4. Protocol Overview 47 4.5. Protocol Description 51 4.6. Protocol Corner- and Error Cases 64 4.7. Summary 66 5. Prototypical Implementation 67 5.1. Technology Overview 67 5.2. Reused Artifacts 68 5.3. Implementation Details 70 5.4. Setup of the Prototypical Implementation 76 5.5. Summary 77 6. Evaluation 79 6.1. Evaluation Methodology 79 6.2. Evaluation Setup 80 6.3. Experiment Overview 81 6.4. Default Case: Successful Adaptation 84 6.5. Compensation on Disconnection of Peers 85 6.6. Recovery from Failed Adaptation 88 6.7. Impact of Early Activation of Adaptations 91 6.8. Comparison with a Blocking Approach 92 6.9. Empirical Study: Fire Extinguishing Drones 95 6.10. Summary 97 7. Conclusion and Future Work 99 7.1. Recap of the Research Questions 99 7.2. Discussion 101 7.3. Future Work 101 A. Protocol Buffer Definition 103 Acronyms 108 Bibliography 10

    Design and Evaluation of a Novel Lens-Based SPECT System Based on Laue Lens Gamma Diffraction: GEANT4/GAMOS Monte Carlo Study

    Full text link
    Abstract While improvements in SPECT imaging techniques constitute a significant advance in biomedical science and cancer diagnosis, their limited spatial resolution has hindered their application to small animal research and early tumour detection. Using recent breakthroughs established by the high-energy astrophysics community, focusing X-ray optics provides a method to overcome the paradigm of low resolution and presents the possibility of imaging small objects with sub-millimetre resolution. This thesis aims to tackle the constraints associated with the current SPECT imaging designs by exploiting the notion of focusing high energy photons through Laue lens diffraction and developing a means of performing gamma rays imaging that would not rely on parallel or pinhole collimators. The gradual development of the novel system is discussed, starting from the single, modular, and multi-Laue lens-based SPECT. A customized 3D reconstruction algorithm was developed to reconstruct an accurate 3D radioactivity distribution from focused projections. A plug-in implementing the Laue diffraction concept was developed and used to model gamma rays focusing in the GEANT4 toolkit. The plug-in will be incorporated into GEANT4 upon final approval from its developers. The single lens-based, modular lens-based and multi lens-based SPECT models detected one hit per 42 source photons (sensitivity of 790 ⁄), three hits per 42 source photons (sensitivity of 2,373 ⁄), and one hit per 20 source photons (sensitivity of 1,670 ⁄), respectively. Based on the generated 3D reconstructed images, the achievable spatial resolution was found to be 0.1 full width at half maximum (FWHM). The proposed design’s performance parameters were compared against the existing SIEMENS parallel LEHR and multi-pinhole (5-MWB-1.0) Inveon SPECT. The achievable spatial resolution is decoupled from the sensitivity of the system, which is in stark contrast with the existing collimators that suffer from the resolution-sensitivity trade-off and are limited to a resolution of 2 . The proposed system allows discrimination between adjacent volumes as small as 0.113 , which is substantially smaller than what can be imaged by any existing SPECT or PET system. The proposed design could lay the foundation for a new SPECT imaging technology akin to a combination of tomosynthesis and lightfield imaging

    A Survey on Transactional Stream Processing

    Full text link
    Transactional stream processing (TSP) strives to create a cohesive model that merges the advantages of both transactional and stream-oriented guarantees. Over the past decade, numerous endeavors have contributed to the evolution of TSP solutions, uncovering similarities and distinctions among them. Despite these advances, a universally accepted standard approach for integrating transactional functionality with stream processing remains to be established. Existing TSP solutions predominantly concentrate on specific application characteristics and involve complex design trade-offs. This survey intends to introduce TSP and present our perspective on its future progression. Our primary goals are twofold: to provide insights into the diverse TSP requirements and methodologies, and to inspire the design and development of groundbreaking TSP systems

    Fault-based Analysis of Industrial Cyber-Physical Systems

    Get PDF
    The fourth industrial revolution called Industry 4.0 tries to bridge the gap between traditional Electronic Design Automation (EDA) technologies and the necessity of innovating in many indus- trial fields, e.g., automotive, avionic, and manufacturing. This complex digitalization process in- volves every industrial facility and comprises the transformation of methodologies, techniques, and tools to improve the efficiency of every industrial process. The enhancement of functional safety in Industry 4.0 applications needs to exploit the studies related to model-based and data-driven anal- yses of the deployed Industrial Cyber-Physical System (ICPS). Modeling an ICPS is possible at different abstraction levels, relying on the physical details included in the model and necessary to describe specific system behaviors. However, it is extremely complicated because an ICPS is com- posed of heterogeneous components related to different physical domains, e.g., digital, electrical, and mechanical. In addition, it is also necessary to consider not only nominal behaviors but even faulty behaviors to perform more specific analyses, e.g., predictive maintenance of specific assets. Nevertheless, these faulty data are usually not present or not available directly from the industrial machinery. To overcome these limitations, constructing a virtual model of an ICPS extended with different classes of faults enables the characterization of faulty behaviors of the system influenced by different faults. In literature, these topics are addressed with non-uniformly approaches and with the absence of standardized and automatic methodologies for describing and simulating faults in the different domains composing an ICPS. This thesis attempts to overcome these state-of-the-art gaps by proposing novel methodologies, techniques, and tools to: model and simulate analog and multi-domain systems; abstract low-level models to higher-level behavioral models; and monitor industrial systems based on the Industrial Internet of Things (IIOT) paradigm. Specifically, the proposed contributions involve the exten- sion of state-of-the-art fault injection practices to improve the ICPSs safety, the development of frameworks for safety operations automatization, and the definition of a monitoring framework for ICPSs. Overall, fault injection in analog and digital models is the state of the practice to en- sure functional safety, as mentioned in the ISO 26262 standard specific for the automotive field. Starting from state-of-the-art defects defined for analog descriptions, new defects are proposed to enhance the IEEE P2427 draft standard for analog defect modeling and coverage. Moreover, dif- ferent techniques to abstract a transistor-level model to a behavioral model are proposed to speed up the simulation of faulty circuits. Therefore, unlike the electrical domain, there is no extensive use of fault injection techniques in the mechanical one. Thus, extending the fault injection to the mechanical and thermal fields allows for supporting the definition and evaluation of more reliable safety mechanisms. Hence, a taxonomy of mechanical faults is derived from the electrical domain by exploiting the physical analogies. Furthermore, specific tools are built for automatically instru- menting different descriptions with multi-domain faults. The entire work is proposed as a basis for supporting the creation of increasingly resilient and secure ICPS that need to preserve functional safety in any operating context

    Diffeomorphic Transformations for Time Series Analysis: An Efficient Approach to Nonlinear Warping

    Full text link
    The proliferation and ubiquity of temporal data across many disciplines has sparked interest for similarity, classification and clustering methods specifically designed to handle time series data. A core issue when dealing with time series is determining their pairwise similarity, i.e., the degree to which a given time series resembles another. Traditional distance measures such as the Euclidean are not well-suited due to the time-dependent nature of the data. Elastic metrics such as dynamic time warping (DTW) offer a promising approach, but are limited by their computational complexity, non-differentiability and sensitivity to noise and outliers. This thesis proposes novel elastic alignment methods that use parametric \& diffeomorphic warping transformations as a means of overcoming the shortcomings of DTW-based metrics. The proposed method is differentiable \& invertible, well-suited for deep learning architectures, robust to noise and outliers, computationally efficient, and is expressive and flexible enough to capture complex patterns. Furthermore, a closed-form solution was developed for the gradient of these diffeomorphic transformations, which allows an efficient search in the parameter space, leading to better solutions at convergence. Leveraging the benefits of these closed-form diffeomorphic transformations, this thesis proposes a suite of advancements that include: (a) an enhanced temporal transformer network for time series alignment and averaging, (b) a deep-learning based time series classification model to simultaneously align and classify signals with high accuracy, (c) an incremental time series clustering algorithm that is warping-invariant, scalable and can operate under limited computational and time resources, and finally, (d) a normalizing flow model that enhances the flexibility of affine transformations in coupling and autoregressive layers.Comment: PhD Thesis, defended at the University of Navarra on July 17, 2023. 277 pages, 8 chapters, 1 appendi

    Mobile Forensics – The File Format Handbook

    Get PDF
    This open access book summarizes knowledge about several file systems and file formats commonly used in mobile devices. In addition to the fundamental description of the formats, there are hints about the forensic value of possible artefacts, along with an outline of tools that can decode the relevant data. The book is organized into two distinct parts: Part I describes several different file systems that are commonly used in mobile devices. · APFS is the file system that is used in all modern Apple devices including iPhones, iPads, and even Apple Computers, like the MacBook series. · Ext4 is very common in Android devices and is the successor of the Ext2 and Ext3 file systems that were commonly used on Linux-based computers. · The Flash-Friendly File System (F2FS) is a Linux system designed explicitly for NAND Flash memory, common in removable storage devices and mobile devices, which Samsung Electronics developed in 2012. · The QNX6 file system is present in Smartphones delivered by Blackberry (e.g. devices that are using Blackberry 10) and modern vehicle infotainment systems that use QNX as their operating system. Part II describes five different file formats that are commonly used on mobile devices. · SQLite is nearly omnipresent in mobile devices with an overwhelming majority of all mobile applications storing their data in such databases. · The second leading file format in the mobile world are Property Lists, which are predominantly found on Apple devices. · Java Serialization is a popular technique for storing object states in the Java programming language. Mobile application (app) developers very often resort to this technique to make their application state persistent. · The Realm database format has emerged over recent years as a possible successor to the now ageing SQLite format and has begun to appear as part of some modern applications on mobile devices. · Protocol Buffers provide a format for taking compiled data and serializing it by turning it into bytes represented in decimal values, which is a technique commonly used in mobile devices. The aim of this book is to act as a knowledge base and reference guide for digital forensic practitioners who need knowledge about a specific file system or file format. It is also hoped to provide useful insight and knowledge for students or other aspiring professionals who want to work within the field of digital forensics. The book is written with the assumption that the reader will have some existing knowledge and understanding about computers, mobile devices, file systems and file formats
    corecore