Search CORE

2,399 research outputs found

BALANCING PRIVACY, PRECISION AND PERFORMANCE IN DISTRIBUTED SYSTEMS

Author: Zaki Marian
Publication venue
Publication date: 16/01/2020
Field of study

Privacy, Precision, and Performance (3Ps) are three fundamental design objectives in distributed systems. However, these properties tend to compete with one another and are not considered absolute properties or functions. They must be defined and justified in terms of a system, its resources, stakeholder concerns, and the security threat model. To date, distributed systems research has only considered the trade-offs of balancing privacy, precision, and performance in a pairwise fashion. However, this dissertation formally explores the space of trade-offs among all 3Ps by examining three representative classes of distributed systems, namely Wireless Sensor Networks (WSNs), cloud systems, and Data Stream Management Systems (DSMSs). These representative systems support large part of the modern and mission-critical distributed systems. WSNs are real-time systems characterized by unreliable network interconnections and highly constrained computational and power resources. The dissertation proposes a privacy-preserving in-network aggregation protocol for WSNs demonstrating that the 3Ps could be navigated by adopting the appropriate algorithms and cryptographic techniques that are not prohibitively expensive. Next, the dissertation highlights the privacy and precision issues that arise in cloud databases due to the eventual consistency models of the cloud. To address these issues, consistency enforcement techniques across cloud servers are proposed and the trade-offs between 3Ps are discussed to help guide cloud database users on how to balance these properties. Lastly, the 3Ps properties are examined in DSMSs which are characterized by high volumes of unbounded input data streams and strict real-time processing constraints. Within this system, the 3Ps are balanced through a proposed simple and efficient technique that applies access control policies over shared operator networks to achieve privacy and precision without sacrificing the systems performance. Despite that in this dissertation, it was shown that, with the right set of protocols and algorithms, the desirable 3P properties can co-exist in a balanced way in well-established distributed systems, this dissertation is promoting the use of the new 3Ps-by-design concept. This concept is meant to encourage distributed systems designers to proactively consider the interplay among the 3Ps from the initial stages of the systems design lifecycle rather than identifying them as add-on properties to systems

D-Scholarship@Pitt

Recommended from our members

A Formal Characterization of Epsilon Serializability

Author: Pu Calton
Ramamritham Krithi
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1991
Field of study

Epsilon Serializability (ESR) is a generalization of classic serializability (SR). ESR allows some limited amount of inconsistency in transaction processing (TP), through an interface called epsilon-transactions (ETs). For example, some query ETs may view inconsistent data due to non-SR interleaving with concurrent updates. In this paper, we restrict our attention to the situation where query-only ETs run concurrently with consistent update transactions that are SR without the ETs. This paper presents a formal characterization of ESR and ETs. Using the ACTA framework, the first part of this characterization formally expresses the inter-transaction conflicts that are recognized by ESR and, through that, defines ESR, analogous to the manner in which conflict-based serializability is defined. The second part of the paper is devoted to deriving expressions for: (1) the inconsistency in the values of data -- arising from ongoing updates, (2) the inconsistency of the results of a query ““ arising from the inconsistency of the data read in order to process the query, and (3) the inconsistency exported by an update ET - arising from ongoing queries reading uncommitted data produced by the update ET. These expressions are used to determine the preconditions that ET operations have to satisfy in order to maintain the limits on the inconsistency in the data read by query ETs, the inconsistency exported by update ETs, and the inconsistency in the results of queries. This determination suggests possible mechanisms that can be used to realize ESR

Columbia University Academic Commons

Incremental Consistency Guarantees for Replicated Objects

Author: Guerraoui Rachid
Pavlovic Matej
Seredinschi Dragos-Adrian
Publication venue
Publication date: 08/09/2016
Field of study

Programming with replicated objects is difficult. Developers must face the fundamental trade-off between consistency and performance head on, while struggling with the complexity of distributed storage stacks. We introduce Correctables, a novel abstraction that hides most of this complexity, allowing developers to focus on the task of balancing consistency and performance. To aid developers with this task, Correctables provide incremental consistency guarantees, which capture successive refinements on the result of an ongoing operation on a replicated object. In short, applications receive both a preliminary---fast, possibly inconsistent---result, as well as a final---consistent---result that arrives later. We show how to leverage incremental consistency guarantees by speculating on preliminary values, trading throughput and bandwidth for improved latency. We experiment with two popular storage systems (Cassandra and ZooKeeper) and three applications: a Twissandra-based microblogging service, an ad serving system, and a ticket selling system. Our evaluation on the Amazon EC2 platform with YCSB workloads A, B, and C shows that we can reduce the latency of strongly consistent operations by up to 40% (from 100ms to 60ms) at little cost (10% bandwidth increase, 6% throughput drop) in the ad system. Even if the preliminary result is frequently inconsistent (25% of accesses), incremental consistency incurs a bandwidth overhead of only 27%.Comment: 16 total pages, 12 figures. OSDI'16 (to appear

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Building Scalable and Consistent Distributed Databases Under Conflicts

Author: Fan Hua
Publication venue: 'University of Waterloo'
Publication date: 05/04/2018
Field of study

Distributed databases, which rely on redundant and distributed storage across multiple servers, are able to provide mission-critical data management services at large scale. Parallelism is the key to the scalability of distributed databases, but concurrent queries having conflicts may block or abort each other when strong consistency is enforced using rigorous concurrency control protocols. This thesis studies the techniques of building scalable distributed databases under strong consistency guarantees even in the face of high contention workloads. The techniques proposed in this thesis share a common idea, conflict mitigation, meaning mitigating conflicts by rescheduling operations in the concurrency control in the first place instead of resolving contending conflicts. Using this idea, concurrent queries under conflicts can be executed with high parallelism. This thesis explores this idea on both databases that support serializable ACID (atomic, consistency, isolation, durability) transactions, and eventually consistent NoSQL systems. First, the epoch-based concurrency control (ECC) technique is proposed in ALOHA-KV, a new distributed key-value store that supports high performance read-only and write-only distributed transactions. ECC demonstrates that concurrent serializable distributed transactions can be processed in parallel with low overhead even under high contention. With ECC, a new atomic commitment protocol is developed that only requires amortized one round trip for a distributed write-only transaction to commit in the absence of failures. Second, a novel paradigm of serializable distributed transaction processing is developed to extend ECC with read-write transaction processing support. This paradigm uses a newly proposed database operator, functors, which is a placeholder for the value of a key, which can be computed asynchronously in parallel with other functor computations of the same or other transactions. Functor-enabled ECC achieves more fine-grained concurrency control than transaction level concurrency control, and it never aborts transactions due to read-write or write-write conflicts but allows transactions to fail due to logic errors or constraint violations while guaranteeing serializability. Lastly, this thesis explores consistency in the eventually consistent system, Apache Cassandra, for an investigation of the consistency violation, referred to as "consistency spikes". This investigation shows that the consistency spikes exhibited by Cassandra are strongly correlated with garbage collection, particularly the "stop-the-world" phase in the Java virtual machine. Thus, delaying read operations arti cially at servers immediately after a garbage collection pause can virtually eliminate these spikes. All together, these techniques allow distributed databases to provide scalable and consistent storage service

University of Waterloo's Institutional Repository

Characterizing Water and Water-Related Energy Use in Multi-Unit Residential Structures with High Resolution Smart Metering Data

Author: Brewer Joseph C.
Publication venue: DigitalCommons@USU
Publication date: 01/12/2020
Field of study

As urban populations continue to grow and expand, localized demands on water supplies continue to increase as well. These water supplies, which have been historically stable, are also threatened by an increasingly erratic climate. Together, these two factors have significantly increased the likelihood of long-term drought conditions in the American West. In response, water suppliers are investigating new ways to record water use in urban areas to better understand how water is used. One of these methods is smart meters; advanced devices that can record and transmit water use information directly to the water supplier. However, these devices can produce extremely large amounts of data, which can often be difficult to manage. This research investigated methods for data collection and management to advance the feasibility of larger smart meter networks. The techniques we developed are described, as well as how these techniques were used to estimate water and water-related energy use in several student dormitories on Utah State University’s campus. We also detail how water and water-related energy use were estimated. These results offer insight into how water and water-related energy are used in buildings like these, which may be of interest to water suppliers looking for ways to increase their understanding of water use beyond just the number of gallons used

DigitalCommons@USU

Methods to Improve Applicability and Efficiency of Distributed Data-Centric Compute Frameworks

Author: Kambatla Karthik Shashank
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2016
Field of study

The success of modern applications depends on the insights they collect from their data repositories. Data repositories for such applications currently exceed exabytes and are rapidly increasing in size, as they collect data from varied sources - web applications, mobile phones, sensors and other connected devices. Distributed storage and data-centric compute frameworks have been invented to store and analyze these large datasets. This dissertation focuses on extending the applicability and improving the efficiency of distributed data-centric compute frameworks

Purdue E-Pubs

Designing Scalable Mechanisms for Geo-Distributed Platform Services in the Presence of Client Mobility

Author: Gupta Harshit
Publication venue: Georgia Institute of Technology
Publication date: 10/01/2023
Field of study

Situation-awareness applications require low-latency response and high network bandwidth, hence benefiting from geo-distributed Edge infrastructures. The developers of these applications typically rely on several platform services, such as Kubernetes, Apache Cassandra and Pulsar, for managing their compute and data components across the geo-distributed Edge infrastructure. Situation-awareness applications impose peculiar requirements on the compute and data placement policies of the platform services. Firstly, the processing logic of these applications is closely tied to the physical environment that it is interacting with. Hence, the access pattern to compute and data exhibits strong spatial affinity. Secondly, the network topology of Edge infrastructure is heterogeneous, wherein communication latency forms a significant portion of the end-to-end compute and data access latency. Therefore, the placement of compute and data components has to be cognizant of the spatial affinity and latency requirements of the applications. However, clients of situation-awareness applications, such as vehicles and drones, are typically mobile – making the compute and data access pattern dynamic and complicating the management of data and compute components. Constant changes in the network connectivity and spatial locality of clients due to client mobility results in making the current placement of compute and data components unsuitable for meeting the latency and spatial affinity requirements of the application. Constant client mobility necessitates that client location and latency offered by the platform services be continuously monitored to detect when application requirements are violated and to adapt the compute and data placement. The control and monitoring modules of off-the-shelf platform services do not have the necessary primitives to incorporate spatial affinity and network topology awareness into their compute and data placement policies. The spatial location of clients is not considered as an input for decision- making in their control modules. Furthermore, they do not perform fine-grained end-to-end monitoring of observed latency to detect and adapt to performance degradations due to client mobility. This dissertation presents three mechanisms that inform the compute and data placement policies of platform services, so that application requirements can be met. M1: Dynamic Spatial Context Management for system entities – clients and data and compute components – to ensure spatial affinity requirements are satisfied. M2: Network Proximity Estimation to provide topology-awareness to the data and compute placement policies of platform services. M3: End-to-End Latency Monitoring to enable collection, aggregation and analysis of per-application metrics in a geo-distributed manner to provide end-to-end insights into application performance. The thesis of our work is that the aforementioned mechanisms are fundamental building blocks for the compute and data management policies of platform services, and that by incorporating them, platform services can meet application requirements at the Edge. Furthermore, the proposed mechanisms can be implemented in a way that offers high scalability to handle high levels of client activity. We demonstrate by construction the efficacy and scalability of the proposed mechanisms for building dynamic compute and data orchestration policies by incorporating them in the control and monitoring modules of three different platform services. Specifically, we incorporate these mechanisms into a topic-based publish-subscribe system (ePulsar), an application orchestration platform (OneEdge), and a key-value store (FogStore). We conduct extensive performance evaluation of these enhanced platform services to showcase how the new mechanisms aid in dynamically adapting the compute/data orchestration decisions to satisfy performance requirements of applicationsPh.D

Scholarly Materials And Research @ Georgia Tech