2,399 research outputs found
BALANCING PRIVACY, PRECISION AND PERFORMANCE IN DISTRIBUTED SYSTEMS
Privacy, Precision, and Performance (3Ps) are three fundamental design objectives in distributed systems. However, these properties tend to compete with one another and are not considered absolute properties or functions. They must be defined and justified in terms of a system, its resources, stakeholder concerns, and the security threat model.
To date, distributed systems research has only considered the trade-offs of balancing privacy, precision, and performance in a pairwise fashion. However, this dissertation formally explores the space of trade-offs among all 3Ps by examining three representative classes of distributed systems, namely Wireless Sensor Networks (WSNs), cloud systems, and Data Stream Management Systems (DSMSs). These representative systems support large part of the modern and mission-critical distributed systems.
WSNs are real-time systems characterized by unreliable network interconnections and highly constrained computational and power resources. The dissertation proposes a privacy-preserving in-network aggregation protocol for WSNs demonstrating that the 3Ps could be navigated by adopting the appropriate algorithms and cryptographic techniques that are not prohibitively expensive.
Next, the dissertation highlights the privacy and precision issues that arise in cloud databases due to the eventual consistency models of the cloud. To address these issues, consistency enforcement techniques across cloud servers are proposed and the trade-offs between 3Ps are discussed to help guide cloud database users on how to balance these properties.
Lastly, the 3Ps properties are examined in DSMSs which are characterized by high volumes of unbounded input data streams and strict real-time processing constraints. Within this system, the 3Ps are balanced through a proposed simple and efficient technique that applies access control policies over shared operator networks to achieve privacy and precision without sacrificing the systems performance.
Despite that in this dissertation, it was shown that, with the right set of protocols and algorithms, the desirable 3P properties can co-exist in a balanced way in well-established distributed systems, this dissertation is promoting the use of the new 3Ps-by-design concept. This concept is meant to encourage distributed systems designers to proactively consider the interplay among the 3Ps from the initial stages of the systems design lifecycle rather than identifying them as add-on properties to systems
Recommended from our members
A Formal Characterization of Epsilon Serializability
Epsilon Serializability (ESR) is a generalization of classic serializability (SR). ESR allows some limited amount of inconsistency in transaction processing (TP), through an interface called epsilon-transactions (ETs). For example, some query ETs may view inconsistent data due to non-SR interleaving with concurrent updates. In this paper, we restrict our attention to the situation where query-only ETs run concurrently with consistent update transactions that are SR without the ETs. This paper presents a formal characterization of ESR and ETs. Using the ACTA framework, the first part of this characterization formally expresses the inter-transaction conflicts that are recognized by ESR and, through that, defines ESR, analogous to the manner in which conflict-based serializability is defined. The second part of the paper is devoted to deriving expressions for: (1) the inconsistency in the values of data -- arising from ongoing updates, (2) the inconsistency of the results of a query ““ arising from the inconsistency of the data read in order to process the query, and (3) the inconsistency exported by an update ET - arising from ongoing queries reading uncommitted data produced by the update ET. These expressions are used to determine the preconditions that ET operations have to satisfy in order to maintain the limits on the inconsistency in the data read by query ETs, the inconsistency exported by update ETs, and the inconsistency in the results of queries. This determination suggests possible mechanisms that can be used to realize ESR
Incremental Consistency Guarantees for Replicated Objects
Programming with replicated objects is difficult. Developers must face the
fundamental trade-off between consistency and performance head on, while
struggling with the complexity of distributed storage stacks. We introduce
Correctables, a novel abstraction that hides most of this complexity, allowing
developers to focus on the task of balancing consistency and performance. To
aid developers with this task, Correctables provide incremental consistency
guarantees, which capture successive refinements on the result of an ongoing
operation on a replicated object. In short, applications receive both a
preliminary---fast, possibly inconsistent---result, as well as a
final---consistent---result that arrives later.
We show how to leverage incremental consistency guarantees by speculating on
preliminary values, trading throughput and bandwidth for improved latency. We
experiment with two popular storage systems (Cassandra and ZooKeeper) and three
applications: a Twissandra-based microblogging service, an ad serving system,
and a ticket selling system. Our evaluation on the Amazon EC2 platform with
YCSB workloads A, B, and C shows that we can reduce the latency of strongly
consistent operations by up to 40% (from 100ms to 60ms) at little cost (10%
bandwidth increase, 6% throughput drop) in the ad system. Even if the
preliminary result is frequently inconsistent (25% of accesses), incremental
consistency incurs a bandwidth overhead of only 27%.Comment: 16 total pages, 12 figures. OSDI'16 (to appear
Building Scalable and Consistent Distributed Databases Under Conflicts
Distributed databases, which rely on redundant and distributed storage across multiple
servers, are able to provide mission-critical data management services at large scale. Parallelism
is the key to the scalability of distributed databases, but concurrent queries having
conflicts may block or abort each other when strong consistency is enforced using rigorous
concurrency control protocols. This thesis studies the techniques of building scalable distributed
databases under strong consistency guarantees even in the face of high contention
workloads. The techniques proposed in this thesis share a common idea, conflict mitigation,
meaning mitigating conflicts by rescheduling operations in the concurrency control in the first
place instead of resolving contending conflicts. Using this idea, concurrent queries
under conflicts can be executed with high parallelism. This thesis explores this idea on
both databases that support serializable ACID (atomic, consistency, isolation, durability)
transactions, and eventually consistent NoSQL systems.
First, the epoch-based concurrency control (ECC) technique is proposed in ALOHA-KV,
a new distributed key-value store that supports high performance read-only and write-only
distributed transactions. ECC demonstrates that concurrent serializable distributed
transactions can be processed in parallel with low overhead even under high contention.
With ECC, a new atomic commitment protocol is developed that only requires amortized
one round trip for a distributed write-only transaction to commit in the absence of failures.
Second, a novel paradigm of serializable distributed transaction processing is developed
to extend ECC with read-write transaction processing support. This paradigm uses a
newly proposed database operator, functors, which is a placeholder for the value of a key,
which can be computed asynchronously in parallel with other functor computations of the
same or other transactions. Functor-enabled ECC achieves more fine-grained concurrency
control than transaction level concurrency control, and it never aborts transactions due
to read-write or write-write conflicts but allows transactions to fail due to logic errors or
constraint violations while guaranteeing serializability.
Lastly, this thesis explores consistency in the eventually consistent system, Apache
Cassandra, for an investigation of the consistency violation, referred to as "consistency
spikes". This investigation shows that the consistency spikes exhibited by Cassandra are
strongly correlated with garbage collection, particularly the "stop-the-world" phase in the
Java virtual machine. Thus, delaying read operations arti cially at servers immediately
after a garbage collection pause can virtually eliminate these spikes.
All together, these techniques allow distributed databases to provide scalable and
consistent storage service
Characterizing Water and Water-Related Energy Use in Multi-Unit Residential Structures with High Resolution Smart Metering Data
As urban populations continue to grow and expand, localized demands on water supplies continue to increase as well. These water supplies, which have been historically stable, are also threatened by an increasingly erratic climate. Together, these two factors have significantly increased the likelihood of long-term drought conditions in the American West. In response, water suppliers are investigating new ways to record water use in urban areas to better understand how water is used. One of these methods is smart meters; advanced devices that can record and transmit water use information directly to the water supplier. However, these devices can produce extremely large amounts of data, which can often be difficult to manage. This research investigated methods for data collection and management to advance the feasibility of larger smart meter networks. The techniques we developed are described, as well as how these techniques were used to estimate water and water-related energy use in several student dormitories on Utah State University’s campus. We also detail how water and water-related energy use were estimated. These results offer insight into how water and water-related energy are used in buildings like these, which may be of interest to water suppliers looking for ways to increase their understanding of water use beyond just the number of gallons used
Methods to Improve Applicability and Efficiency of Distributed Data-Centric Compute Frameworks
The success of modern applications depends on the insights they collect from their data repositories. Data repositories for such applications currently exceed exabytes and are rapidly increasing in size, as they collect data from varied sources - web applications, mobile phones, sensors and other connected devices. Distributed storage and data-centric compute frameworks have been invented to store and analyze these large datasets. This dissertation focuses on extending the applicability and improving the efficiency of distributed data-centric compute frameworks
Designing Scalable Mechanisms for Geo-Distributed Platform Services in the Presence of Client Mobility
Situation-awareness applications require low-latency response and high network bandwidth, hence benefiting from geo-distributed Edge infrastructures. The developers of these applications typically rely on several platform services, such as Kubernetes, Apache Cassandra and Pulsar, for managing their compute and data components across the geo-distributed Edge infrastructure. Situation-awareness applications impose peculiar requirements on the compute and data placement policies of the platform services. Firstly, the processing logic of these applications is closely tied to the physical environment that it is interacting with. Hence, the access pattern to compute and data exhibits strong spatial affinity. Secondly, the network topology of Edge infrastructure is heterogeneous, wherein communication latency forms a significant portion of the end-to-end compute and data access latency. Therefore, the placement of compute and data components has to be cognizant of the spatial affinity and latency requirements of the applications. However, clients of situation-awareness applications, such as vehicles and drones, are typically mobile – making the compute and data access pattern dynamic and complicating the management of data and compute components. Constant changes in the network connectivity and spatial locality of clients due to client mobility results in making the current placement of compute and data components unsuitable for meeting the latency and spatial affinity requirements of the application. Constant client mobility necessitates that client location and latency offered by the platform services be continuously monitored to detect when application requirements are violated and to adapt the compute and data placement. The control and monitoring modules of off-the-shelf platform services do not have the necessary primitives to incorporate spatial affinity and network topology awareness into their compute and data placement policies. The spatial location of clients is not considered as an input for decision- making in their control modules. Furthermore, they do not perform fine-grained end-to-end monitoring of observed latency to detect and adapt to performance degradations due to client mobility.
This dissertation presents three mechanisms that inform the compute and data placement policies of platform services, so that application requirements can be met.
M1: Dynamic Spatial Context Management for system entities – clients and data and compute components – to ensure spatial affinity requirements are satisfied.
M2: Network Proximity Estimation to provide topology-awareness to the data and compute placement policies of platform services.
M3: End-to-End Latency Monitoring to enable collection, aggregation and analysis of per-application metrics in a geo-distributed manner to provide end-to-end insights into application performance.
The thesis of our work is that the aforementioned mechanisms are fundamental building blocks for the compute and data management policies of platform services, and that by incorporating them, platform services can meet application requirements at the Edge. Furthermore, the proposed mechanisms can be implemented in a way that offers high scalability to handle high levels of client activity. We demonstrate by construction the efficacy and scalability of the proposed mechanisms for building dynamic compute and data orchestration policies by incorporating them in the control and monitoring modules of three different platform services. Specifically, we incorporate these mechanisms into a topic-based publish-subscribe system (ePulsar), an application orchestration platform (OneEdge), and a key-value store (FogStore). We conduct extensive performance evaluation of these enhanced platform services to showcase how the new mechanisms aid in dynamically adapting the compute/data orchestration decisions to satisfy performance requirements of applicationsPh.D
- …