27 research outputs found
Invalidation-based protocols for replicated datastores
Distributed in-memory datastores underpin cloud applications that run within a datacenter and demand high performance, strong consistency, and availability. A key feature of datastores is data replication. The data are replicated across servers because a single server often cannot handle the request load. Replication is also necessary to guarantee that a server or link failure does not render a portion of the dataset inaccessible. A replication protocol is responsible for ensuring strong consistency between the replicas of a datastore, even when faults occur, by determining the actions necessary to access and manipulate the data. Consequently, a replication protocol also drives the datastore's performance.
Existing strongly consistent replication protocols deliver fault tolerance but fall short in terms of performance. Meanwhile, the opposite occurs in the world of multiprocessors, where data are replicated across the private caches of different cores. The multiprocessor regime uses invalidations to afford strongly consistent replication with high performance but neglects fault tolerance.
Although handling failures in the datacenter is critical for data availability, we observe that the common operation is fault-free and far exceeds the operation during faults. In other words, the common operating environment inside a datacenter closely resembles that of a multiprocessor. Based on this insight, we draw inspiration from the multiprocessor for high-performance, strongly consistent replication in the datacenter. The primary contribution of this thesis is in adapting invalidating protocols to the nuances of replicated datastores, which include skewed data accesses, fault tolerance, and distributed transactions
A Holistic Approach to Lowering Latency in Geo-distributed Web Applications
User perceived end-to-end latency of web applications have a huge impact on the revenue for many businesses. The end-to-end latency of web applications is impacted by: (i) User to Application server (front-end) latency which includes downloading and parsing web pages, retrieving further objects requested by javascript executions; and (ii) Application and storage server(back-end) latency which includes retrieving meta-data required for an initial rendering, and subsequent content based on user actions.
Improving the user-perceived performance of web applications is challenging, given their complex operating environments involving user-facing web servers, content distribution network (CDN) servers, multi-tiered application servers, and storage servers. Further, the application and storage servers are often deployed on multi-tenant cloud platforms that show high performance variability. While many novel approaches like SPDY and geo-replicated datastores have been developed to improve their performance, many of these solutions are specific to certain layers, and may have different impact on user-perceived performance.
The primary goal of this thesis is to address the above challenges in a holistic manner, focusing specifically on improving the end-to-end latency of geo-distributed multi-tiered web applications. This thesis makes the following contributions: (i) First, it reduces user-facing latency by helping CDNs identify and map objects that are more critical for page-load latency to the faster CDN cache layers. Through controlled experiments on real-world web pages, we show the potential of our approach to reduce hundreds of milliseconds in latency without affecting overall CDN miss rates. (ii) Next, it reduces back-end latency by optimally adapting the datastore replication policies (including number and location of replicas) to the heterogeneity in workloads. We show the benefits of our replication models using real-world traces of Twitter, Wikipedia and Gowalla on a 8 datacenter Cassandra cluster deployed on EC2. (iii) Finally, it makes multi-tier applications resilient to the inherent performance variability in the cloud through fine-grained request redirection. We highlight the benefits of our approach by deploying three real-world applications on commercial cloud platforms
The Architecture of an Autonomic, Resource-Aware, Workstation-Based Distributed Database System
Distributed software systems that are designed to run over workstation
machines within organisations are termed workstation-based. Workstation-based
systems are characterised by dynamically changing sets of machines that are
used primarily for other, user-centric tasks. They must be able to adapt to and
utilize spare capacity when and where it is available, and ensure that the
non-availability of an individual machine does not affect the availability of
the system. This thesis focuses on the requirements and design of a
workstation-based database system, which is motivated by an analysis of
existing database architectures that are typically run over static, specially
provisioned sets of machines. A typical clustered database system -- one that
is run over a number of specially provisioned machines -- executes queries
interactively, returning a synchronous response to applications, with its data
made durable and resilient to the failure of machines. There are no existing
workstation-based databases. Furthermore, other workstation-based systems do
not attempt to achieve the requirements of interactivity and durability,
because they are typically used to execute asynchronous batch processing jobs
that tolerate data loss -- results can be re-computed. These systems use
external servers to store the final results of computations rather than
workstation machines. This thesis describes the design and implementation of a
workstation-based database system and investigates its viability by evaluating
its performance against existing clustered database systems and testing its
availability during machine failures.Comment: Ph.D. Thesi
Research challenges in nextgen service orchestration
Fog/edge computing, function as a service, and programmable infrastructures, like software-defined networking or network function virtualisation, are becoming ubiquitously used in modern Information Technology infrastructures. These technologies change the characteristics and capabilities of the underlying computational substrate where services run (e.g. higher volatility, scarcer computational power, or programmability). As a consequence, the nature of the services that can be run on them changes too (smaller codebases, more fragmented state, etc.). These changes bring new requirements for service orchestrators, which need to evolve so as to support new scenarios where a close interaction between service and infrastructure becomes essential to deliver a seamless user experience. Here, we present the challenges brought forward by this new breed of technologies and where current orchestration techniques stand with regards to the new challenges. We also present a set of promising technologies that can help tame this brave new world
Adaptive Data Storage and Placement in Distributed Database Systems
Distributed database systems are widely used to provide scalable storage, update and query facilities for application data. Distributed databases primarily use data replication and data partitioning to spread load across nodes or sites. The presence of hotspots in workloads, however, can result in imbalanced load on the distributed system resulting in performance degradation. Moreover, updates to partitioned and replicated data can require expensive distributed coordination to ensure that they are applied atomically and consistently. Additionally, data storage formats, such as row and columnar layouts, can significantly impact latencies of mixed transactional and analytical workloads. Consequently, how and where data is stored among the sites in a distributed database can significantly affect system performance, particularly if the workload is not known ahead of time. To address these concerns, this thesis proposes adaptive data placement and storage techniques for distributed database systems.
This thesis demonstrates that the performance of distributed database systems can be improved by automatically adapting how and where data is stored by leveraging online workload information. A two-tiered architecture for adaptive distributed database systems is proposed that includes an adaptation advisor that decides at which site(s) and how transactions execute. The adaptation advisor makes these decisions based on submitted transactions. This design is used in three adaptive distributed database systems presented in this thesis: (i) DynaMast that efficiently transfers data mastership to guarantee single-site transactions while maintaining well-understood and established transactional semantics, (ii) MorphoSys that selectively and adaptively replicates, partitions and remasters data based on a learned cost model to improve transaction processing, and (iii) Proteus that uses learned workload models to predictively and adaptively change storage layouts to support both high transactional throughput and low latency analytical queries.
Collectively, this thesis is a concrete step towards autonomous database systems that allow users to specify only the data to store and the queries to execute, leaving the system to judiciously choose the storage and execution mechanisms to deliver high performance
Scalable data management for web applications
Steen, M.R. van [Promotor]Pierre, G.E.O. [Copromotor]Chi, C.H. [Copromotor
Managing Device and Platform Heterogeneity through the Web of Things
The chaotic growth of the IoT determined a fragmented landscape with a huge number of devices, technologies, and platforms available on the market, and consequential issues of interoperability on many system deployments. The Web of Things (WoT) architecture recently proposed by the W3C consortium constitutes a novel solution to enable interoperability across IoT Platforms and application domains. At the same time, in order to see an effective improvement, a wide adoption of the W3C WoT solutions from the academic and industrial communities is required; this translates into the need of accurate and complete support tools to ease the deployment of W3C WoT applications, as well as reference guidelines about how to enable the WoT on top of existing IoT scenarios and how to deploy WoT scenarios from scratch. In this thesis, we bring three main contributions for filling such gap: (1) we introduce the WoT Store, a novel platform for managing and easing the deployment of Things and applications on the W3C WoT, and additional strategies for bringing old legacy IoT systems into the WoT. The WoT Store allows the dynamic discovery of the resources available in the environment, i.e. the Things, and to interact with each of them through a dashboard by visualizing their properties, executing commands, or observing the notifications produced. (2) We map three different IoT scenarios to WoT scenarios: a generic heterogeneous environmental monitoring scenario, a structural health monitoring scenario and an Industry4.0 scenario. (3) We make proposals to improve both the W3C standard and the node-wot software stack design: in the first case, new vocabularies are needed in order to handle particular protocols employed in industrial scenarios, while in the second case we present some contributions required for the dynamic instantiation and the migration of Web Things and WoT services in a cloud-to-edge continuum environment
Big Data Now, 2015 Edition
Now in its fifth year, OâReillyâs annual Big Data Now report recaps the trends, tools, applications, and forecasts weâve talked about over the past year. For 2015, weâve included a collection of blog posts, authored by leading thinkers and experts in the field, that reflect a unique set of themes weâve identified as gaining significant attention and traction.
Our list of 2015 topics include:
Data-driven cultures
Data science
Data pipelines
Big data architecture and infrastructure
The Internet of Things and real time
Applications of big data
Security, ethics, and governance
Is your organization on the right track? Get a hold of this free report now and stay in tune with the latest significant developments in big data
Security Technologies and Methods for Advanced Cyber Threat Intelligence, Detection and Mitigation
The rapid growth of the Internet interconnectivity and complexity of communication systems has led us to a significant growth of cyberattacks globally often with severe and disastrous consequences. The swift development of more innovative and effective (cyber)security solutions and approaches are vital which can detect, mitigate and prevent from these serious consequences. Cybersecurity is gaining momentum and is scaling up in very many areas. This book builds on the experience of the Cyber-Trust EU projectâs methods, use cases, technology development, testing and validation and extends into a broader science, lead IT industry market and applied research with practical cases. It offers new perspectives on advanced (cyber) security innovation (eco) systems covering key different perspectives. The book provides insights on new security technologies and methods for advanced cyber threat intelligence, detection and mitigation. We cover topics such as cyber-security and AI, cyber-threat intelligence, digital forensics, moving target defense, intrusion detection systems, post-quantum security, privacy and data protection, security visualization, smart contracts security, software security, blockchain, security architectures, system and data integrity, trust management systems, distributed systems security, dynamic risk management, privacy and ethics