125 research outputs found
ENHANCING CLOUD SYSTEM RUNTIME TO ADDRESS COMPLEX FAILURES
As the reliance on cloud systems intensifies in our progressively digital world, understanding and reinforcing their reliability becomes more crucial than ever. Despite impressive advancements in augmenting the resilience of cloud systems, the growing incidence of complex failures now poses a substantial challenge to the availability of these systems. With cloud systems continuing to scale and increase in complexity, failures not only become more elusive to detect but can also lead to more catastrophic consequences. Such failures question the foundational premises of conventional fault-tolerance designs, necessitating the creation of novel system designs to counteract them.
This dissertation aims to enhance distributed systems’ capabilities to detect, localize, and react to complex failures at runtime. To this end, this dissertation makes contributions to address three emerging categories of failures in cloud systems. The first part delves into the investigation of partial failures, introducing OmegaGen, a tool adept at generating tailored checkers for detecting and localizing such failures. The second part grapples with silent semantic failures prevalent in cloud systems, showcasing our study findings, and introducing Oathkeeper, a tool that leverages past failures to infer rules and expose these silent issues. The third part explores solutions to slow failures via RESIN, a framework specifically designed to detect, diagnose, and mitigate memory leaks in cloud-scale infrastructures, developed in collaboration with Microsoft Azure. The dissertation concludes by offering insights into future directions for the construction of reliable cloud systems
RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-Design
Software-defined networking (SDN) and software-defined flash (SDF) have been
serving as the backbone of modern data centers. They are managed separately to
handle I/O requests. At first glance, this is a reasonable design by following
the rack-scale hierarchical design principles. However, it suffers from
suboptimal end-to-end performance, due to the lack of coordination between SDN
and SDF.
In this paper, we co-design the SDN and SDF stack by redefining the functions
of their control plane and data plane, and splitting up them within a new
architecture named RackBlox. RackBlox decouples the storage management
functions of flash-based solid-state drives (SSDs), and allow the SDN to track
and manage the states of SSDs in a rack. Therefore, we can enable the state
sharing between SDN and SDF, and facilitate global storage resource management.
RackBlox has three major components: (1) coordinated I/O scheduling, in which
it dynamically adjusts the I/O scheduling in the storage stack with the
measured and predicted network latency, such that it can coordinate the effort
of I/O scheduling across the network and storage stack for achieving
predictable end-to-end performance; (2) coordinated garbage collection (GC), in
which it will coordinate the GC activities across the SSDs in a rack to
minimize their impact on incoming I/O requests; (3) rack-scale wear leveling,
in which it enables global wear leveling among SSDs in a rack by periodically
swapping data, for achieving improved device lifetime for the entire rack. We
implement RackBlox using programmable SSDs and switch. Our experiments
demonstrate that RackBlox can reduce the tail latency of I/O requests by up to
5.8x over state-of-the-art rack-scale storage systems.Comment: 14 pages. Published in published in ACM SIGOPS 29th Symposium on
Operating Systems Principles (SOSP'23
EbbRT: a framework for building per-application library operating systems
General purpose operating systems sacrifice per-application performance in order to preserve generality.
On the other hand, substantial effort is required to customize or construct an operating system to meet the needs of an application. This paper describes the design and implementation of the Elastic Building Block Runtime (EbbRT), a framework for building per-application library operating systems. EbbRT reduces the effort required to construct and maintain library operating systems without hindering the degree of specialization required for high performance. We combine several techniques in order to achieve this, including a distributed OS architecture, a low-overhead component model, a lightweight event-driven runtime, and many language level primitives. EbbRT is able to simultaneously enable performance specialization, support for a broad range of
applications, and ease the burden of systems development.
An EbbRT prototype demonstrates the degree of customization made possible by our framework approach.
In an evaluation of memcached, EbbRT and is able to attain 2:08 higher throughput than Linux. The node.js runtime, ported to EbbRT, demonstrates the broad applicability and ease of development enabled by our approachPublished versio
Artificial intelligence for advanced manufacturing quality
100 p.This Thesis addresses the challenge of AI-based image quality control systems applied to manufacturing industry, aiming to improve this field through the use of advanced techniques for data acquisition and processing, in order to obtain robust, reliable and optimal systems. This Thesis presents contributions onthe use of complex data acquisition techniques, the application and design of specialised neural networks for the defect detection, and the integration and validation of these systems in production processes. It has been developed in the context of several applied research projects that provided a practical feedback of the usefulness of the proposed computational advances as well as real life data for experimental validation
Efficient Geo-Distributed Transaction Processing
Distributed deterministic database systems support OLTP workloads over geo-replicated data. Providing these transactions with ACID guarantees requires a delay of multiple wide-area network (WAN) round trips of messaging to totally order transactions globally. This thesis presents Sloth, a geo-replicated database system that can serializably commit transactions after a delay of only a single WAN round trip of messaging. Sloth reduces the cost of determining the total global order for all transactions by leveraging deterministic merging of partial sequences of transactions per geographic region. Using popular workload benchmarks over geo-replicated Azure, this thesis shows that Sloth outperforms state-of-the-art comparison systems to deliver low-latency transaction execution
Generalized GM-MDS: Polynomial Codes are Higher Order MDS
The GM-MDS theorem, conjectured by Dau-Song-Dong-Yuen and proved by Lovett
and Yildiz-Hassibi, shows that the generator matrices of Reed-Solomon codes can
attain every possible configuration of zeros for an MDS code. The recently
emerging theory of higher order MDS codes has connected the GM-MDS theorem to
other important properties of Reed-Solomon codes, including showing that
Reed-Solomon codes can achieve list decoding capacity, even over fields of size
linear in the message length.
A few works have extended the GM-MDS theorem to other families of codes,
including Gabidulin and skew polynomial codes. In this paper, we generalize all
these previous results by showing that the GM-MDS theorem applies to any
\emph{polynomial code}, i.e., a code where the columns of the generator matrix
are obtained by evaluating linearly independent polynomials at different
points. We also show that the GM-MDS theorem applies to dual codes of such
polynomial codes, which is non-trivial since the dual of a polynomial code may
not be a polynomial code. More generally, we show that GM-MDS theorem also
holds for algebraic codes (and their duals) where columns of the generator
matrix are chosen to be points on some irreducible variety which is not
contained in a hyperplane through the origin. Our generalization has
applications to constructing capacity-achieving list-decodable codes as shown
in a follow-up work by Brakensiek-Dhar-Gopi-Zhang, where it is proved that
randomly punctured algebraic-geometric (AG) codes achieve list-decoding
capacity over constant-sized fields.Comment: 34 page
Novel Architectures for Offloading and Accelerating Computations in Artificial Intelligence and Big Data
Due to the end of Moore's Law and Dennard Scaling, performance gains in general-purpose architectures have significantly slowed in recent years. While raising the number of cores has been a viable approach for further performance increases, Amdahl's Law and its implications on parallelization also limit further performance gains. Consequently, research has shifted towards different approaches, including domain-specific custom architectures tailored to specific workloads.
This has led to a new golden age for computer architecture, as noted in the Turing Award Lecture by Hennessy and Patterson, which has spawned several new architectures and architectural advances specifically targeted at highly current workloads, including Machine Learning. This thesis introduces a hierarchy of architectural improvements ranging from minor incremental changes, such as High-Bandwidth Memory, to more complex architectural extensions that offload workloads from the general-purpose CPU towards more specialized accelerators. Finally, we introduce novel architectural paradigms, namely Near-Data or In-Network Processing, as the most complex architectural improvements.
This cumulative dissertation then investigates several architectural improvements to accelerate Sum-Product Networks, a novel Machine Learning approach from the class of Probabilistic Graphical Models. Furthermore, we use these improvements as case studies to discuss the impact of novel architectures, showing that minor and major architectural changes can significantly increase performance in Machine Learning applications.
In addition, this thesis presents recent works on Near-Data Processing, which introduces Smart Storage Devices as a novel architectural paradigm that is especially interesting in the context of Big Data. We discuss how Near-Data Processing can be applied to improve performance in different database settings by offloading database operations to smart storage devices. Offloading data-reductive operations, such as selections, reduces the amount of data transferred, thus improving performance and alleviating bandwidth-related bottlenecks.
Using Near-Data Processing as a use-case, we also discuss how Machine Learning approaches, like Sum-Product Networks, can improve novel architectures. Specifically, we introduce an approach for offloading Cardinality Estimation using Sum-Product Networks that could enable more intelligent decision-making in smart storage devices. Overall, we show that Machine Learning can benefit from developing novel architectures while also showing that Machine Learning can be applied to improve the applications of novel architectures
CoRD: Converged RDMA Dataplane for High-Performance Clouds
High-performance networking is often characterized by kernel bypass which is
considered mandatory in high-performance parallel and distributed applications.
But kernel bypass comes at a price because it breaks the traditional OS
architecture, requiring applications to use special APIs and limiting the OS
control over existing network connections. We make the case, that kernel bypass
is not mandatory. Rather, high-performance networking relies on multiple
performance-improving techniques, with kernel bypass being the least effective.
CoRD removes kernel bypass from RDMA networks, enabling efficient OS-level
control over RDMA dataplane.Comment: 11 page
A Survey on the Integration of NAND Flash Storage in the Design of File Systems and the Host Storage Software Stack
With the ever-increasing amount of data generate in the world, estimated to reach over 200 Zettabytes by 2025, pressure on efficient data storage systems is intensifying. The shift from HDD to flash-based SSD provides one of the most fundamental shifts in storage technology, increasing performance capabilities significantly. However, flash storage comes with different characteristics than prior HDD storage technology. Therefore, storage software was unsuitable for leveraging the capabilities of flash storage. As a result, a plethora of storage applications have been design to better integrate with flash storage and align with flash characteristics. In this literature study we evaluate the effect the introduction of flash storage has had on the design of file systems, which providing one of the most essential mechanisms for managing persistent storage. We analyze the mechanisms for effectively managing flash storage, managing overheads of introduced design requirements, and leverage the capabilities of flash storage. Numerous methods have been adopted in file systems, however prominently revolve around similar design decisions, adhering to the flash hardware constrains, and limiting software intervention. Future design of storage software remains prominent with the constant growth in flash-based storage devices and interfaces, providing an increasing possibility to enhance flash integration in the host storage software stack
- …