385 research outputs found
Middleware-based Database Replication: The Gaps between Theory and Practice
The need for high availability and performance in data management systems has
been fueling a long running interest in database replication from both academia
and industry. However, academic groups often attack replication problems in
isolation, overlooking the need for completeness in their solutions, while
commercial teams take a holistic approach that often misses opportunities for
fundamental innovation. This has created over time a gap between academic
research and industrial practice.
This paper aims to characterize the gap along three axes: performance,
availability, and administration. We build on our own experience developing and
deploying replication systems in commercial and academic settings, as well as
on a large body of prior related work. We sift through representative examples
from the last decade of open-source, academic, and commercial database
replication systems and combine this material with case studies from real
systems deployed at Fortune 500 customers. We propose two agendas, one for
academic research and one for industrial R&D, which we believe can bridge the
gap within 5-10 years. This way, we hope to both motivate and help researchers
in making the theory and practice of middleware-based database replication more
relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on
Management of Data, Vancouver, Canada, June 200
System Support For Stream Processing In Collaborative Cloud-Edge Environment
Stream processing is a critical technique to process huge amount of data in real-time manner.
Cloud computing has been used for stream processing due to its unlimited computation
resources. At the same time, we are entering the era of Internet of Everything (IoE). The emerging
edge computing benefits low-latency applications by leveraging computation resources at
the proximity of data sources. Billions of sensors and actuators are being deployed worldwide
and huge amount of data generated by things are immersed in our daily life. It has become
essential for organizations to be able to stream and analyze data, and provide low-latency analytics
on streaming data. However, cloud computing is inefficient to process all data in a centralized
environment in terms of the network bandwidth cost and response latency. Although
edge computing offloads computation from the cloud to the edge of the Internet, there is not
a data sharing and processing framework that efficiently utilizes computation resources in the
cloud and the edge. Furthermore, the heterogeneity of edge devices brings more difficulty to the development of collaborative cloud-edge applications.
To explore and attack the challenges of stream processing system in collaborative cloudedge
environment, in this dissertation we design and develop a series of systems to support
stream processing applications in hybrid cloud-edge analytics. Specifically, we develop an
hierarchical and hybrid outlier detection model for multivariate time series streams that automatically
selects the best model for different time series. We optimize one of the stream
processing system (i.e., Spark Streaming) to reduce the end-to-end latency. To facilitate the
development of collaborative cloud-edge applications, we propose and implement a new computing
framework, Firework that allows stakeholders to share and process data by leveraging
both the cloud and the edge. A vision-based cloud-edge application is implemented to demonstrate
the capabilities of Firework. By combining all these studies, we provide comprehensive
system support for stream processing in collaborative cloud-edge environment
Recommended from our members
From Controlled Data-Center Environments to Open Distributed Environments: Scalable, Efficient, and Robust Systems with Extended Functionality
The past two decades have witnessed several paradigm shifts in computing environments. Starting from cloud computing which offers on-demand allocation of storage, network, compute, and memory resources, as well as other services, in a pay-as-you-go billingmodel. Ending with the rise of permissionless blockchain technology, a decentralized computing paradigm with lower trust assumptions and limitless number of participants. Unlike in the cloud, where all the computing resources are owned by some trusted cloud provider, permissionless blockchains allow computing resources owned by possibly malicious parties to join and leave their network without obtaining permission from some centralized trusted authority. Still, in the presence of malicious parties, permissionlessblockchain networks can perform general computations and make progress. Cloud computing is powered by geographically distributed data-centers controlled and managed by trusted cloud service providers and promises theoretically infinite computing resources. On the other hand, permissionless blockchains are powered by open networks of geographically distributed computing nodes owned by entities that are not necessarily known or trusted. This paradigm shift requires a reconsideration of distributed data management protocols and distributed system designs that assume low latency across system components, inelastic computing resources, or fully trusted computing resources.In this dissertation, we propose new system designs and optimizations that address scalability and efficiency of distributed data management systems in cloud environments. We also propose several protocols and new programming paradigms to extend the functionality and enhance the robustness of permissionless blockchains. The work presented spans global-scale transaction processing, large-scale stream processing, atomic transaction processing across permissionless blockchains, and extending the functionality and the use-cases of permissionless blockchains. In all these directions, the focus is on rethinking system and protocol designs to account for novel cloud and permissionless blockchain assumptions. For global-scale transaction processing, we propose GPlacer, a placement optimization framework that decides replica placement of fully and partial geo-replicated databases. For large-scale stream processing, we propose Cache-on-Track (CoT) an adaptive and elastic client-side cache that addresses server-side load-imbalances that occur in large-scale distributed storage layers. In permissionless blockchain transaction processing, we propose AC3WN, the first correct cross-chain commitment protocol that guarantees atomicity of cross-chain transactions. Also, we propose TXSC, a transactional smart contract programming framework. TXSC provides smart contract developers with transaction primitives. These primitives allow developers to write smart contracts without the need to reason about the anomalies that can arise due to concurrent smart contract function executions. In addition, we propose a forward-looking architecture that unifies both permissioned and permissionless blockchains and exploits the running infrastructure of permissionless blockchains to build global asset management systems
Identifying and diagnosing video streaming performance issues
On-line video streaming is an ever evolving ecosystem of services and technologies, where content providers are on a constant race to satisfy the users' demand for richer content and higher bitrate streams, updated set of features and cross-platform compatibility. At the same time, network operators are required to ensure that the requested video streams are delivered through the network with a satisfactory quality in accordance with the existing Service Level Agreements (SLA).
However, tracking and maintaining satisfactory video Quality of Experience (QoE) has become a greater challenge for operators than ever before. With the growing popularity of content engagement on handheld devices and over wireless connections, new points-of-failure have added to the list of failures that can affect the video quality. Moreover, the adoption of end-to-end encryption by major streaming services has rendered previously used QoE diagnosis methods obsolete.
In this thesis, we identify the current challenges in identifying and diagnosing video streaming issues and we propose novel approaches in order to address them. More specifically, the thesis initially presents methods and tools to identify a wide array of QoE problems and the severity with which they affect the users' experience. The next part of the thesis deals with the investigation of methods to locate under-performing parts of the network that lead to drop of the delivered quality of a service.
In this context, we propose a data-driven methodology for detecting the under performing areas of cellular network with sub-optimal Quality of Service (QoS) and video QoE. Moreover, we develop and evaluate a multi-vantage point framework that is capable of diagnosing the underlying faults that cause the disruption of the user's experience. The last part of this work, further explores the detection of network performance anomalies and introduces a novel method for detecting such issues using contextual information. This approach provides higher accuracy when detecting network faults in the presence of high variation and can benefit providers to perform early detection of anomalies before they result in QoE issues.La distribución de vÃdeo online es un ecosistema de servicios y tecnologÃas, donde los proveedores de contenidos se encuentran en una carrera continua para satisfacer las demandas crecientes de los usuarios de más riqueza de contenido, velocidad de transmisión, funcionalidad y compatibilidad entre diferentes plataformas. Asimismo, los operadores de red deben asegurar que los contenidos demandados son entregados a través de la red con una calidad satisfactoria según los acuerdos existentes de nivel de servicio (en inglés Service Level Agreement o SLA). Sin embargo, la monitorización y el mantenimiento de un nivel satisfactorio de la calidad de experiencia (en inglés Quality of Experience o QoE) del vÃdeo online se ha convertido en un reto mayor que nunca para los operadores. Dada la creciente popularidad del consumo de contenido con dispositivos móviles y a través de redes inalámbricas, han aparecido nuevos puntos de fallo que se han añadido a la lista de problemas que pueden afectar a la calidad del vÃdeo transmitido. Adicionalmente, la adopción de sistemas de encriptación extremo a extremo, por parte de los servicios más importantes de distribución de vÃdeo online, ha dejado obsoletos los métodos existentes de diagnóstico de la QoE. En esta tesis se identifican los retos actuales en la identificación y diagnóstico de los problemas de transmisión de vÃdeo online, y se proponen nuevas soluciones para abordar estos problemas. Más concretamente, inicialmente la tesis presenta métodos y herramientas para identificar un conjunto amplio de problemas de QoE y la severidad con los que estos afectan a la experiencia de los usuarios. La siguiente parte de la tesis investiga métodos para localizar partes de la red con un rendimiento bajo que resultan en una disminución de la calidad del servicio ofrecido. En este contexto, se propone una metodologÃa basada en el análisis de datos para detectar áreas de la red móvil que ofrecen un nivel subóptimo de calidad de servicio (en inglés Quality of Service o QoS) y QoE. Además, se desarrolla y se evalúa una solución basada en múltiples puntos de medida que es capaz de diagnosticar los problemas subyacentes que causan la alteración de la experiencia de usuario. La última parte de este trabajo explora adicionalmente la detección de anomalÃas de rendimiento de la red y presenta un nuevo método para detectar estas situaciones utilizando información contextual. Este enfoque proporciona una mayor precisión en la detección de fallos de la red en presencia de alta variabilidad y puede ayudar a los proveedores a la detección precoz de anomalÃas antes de que se conviertan en problemas de QoE.La distribució de vÃdeo online és un ecosistema de serveis i tecnologies, on els proveïdors
de continguts es troben en una cursa continua per satisfer les demandes creixents del
usuaris de més riquesa de contingut, velocitat de transmissió, funcionalitat i compatibilitat
entre diferents plataformes. A la vegada, els operadors de xarxa han d’assegurar que
els continguts demandats són entregats a través de la xarxa amb una qualitat satisfactòria
segons els acords existents de nivell de servei (en anglès Service Level Agreement o SLA).
Tanmateix, el monitoratge i el manteniment d’un nivell satisfactori de la qualitat d’experiència (en anglès Quality of Experience o QoE) del vÃdeo online ha esdevingut un repte més gran que mai per als operadors. Donada la creixent popularitat del consum de contingut amb dispositius mòbils i a través de xarxes sense fils, han aparegut nous punts de fallada que s’han afegit a la llista de problemes que poden afectar a la qualitat del vÃdeo transmès. Addicionalment, l’adopció de sistemes d’encriptació extrem a extrem, per part dels serveis més importants de distribució de vÃdeo online, ha deixat obsolets els mètodes existents de diagnòstic de la QoE.
En aquesta tesi s’identifiquen els reptes actuals en la identificació i diagnòstic dels problemes de transmissió de vÃdeo online, i es proposen noves solucions per abordar aquests problemes. Més concretament, inicialment la tesi presenta mètodes i eines per identificar un conjunt ampli de problemes de QoE i la severitat amb la que aquests afecten a la experiència dels usuaris. La següent part de la tesi investiga mètodes per localitzar parts de la xarxa amb un rendiment baix que resulten en una disminució de la qualitat del servei ofert.
En aquest context es proposa una metodologia basada en l’anà lisi de dades per detectar
à rees de la xarxa mòbil que ofereixen un nivell subòptim de qualitat de servei (en anglès Quality of Service o QoS) i QoE. A més, es desenvolupa i s’avalua una solució basada
en múltiples punts de mesura que és capaç de diagnosticar els problemes subjacents que
causen l’alteració de l’experiència d’usuari. L’última part d’aquest treball explora addicionalment la detecció d’anomalies de rendiment de la xarxa i presenta un nou mètode per detectar aquestes situacions utilitzant informació contextual. Aquest enfoc proporciona una major precisió en la detecció de fallades de la xarxa en presencia d’alta variabilitat i pot ajudar als proveïdors a la detecció precoç d’anomalies abans de que es converteixin en problemes de QoE.Postprint (published version
Re-routing using Contraction Hierarchies in Software-Defined Networks
According to the Open Networking Foundation (ONF), one of the reasons to reexamine traditional network architectures is the increment of mobile devices and its data transmission. The global IP traffic forecast by CISCO estimates an overall traffic increase to 396 exabytes per month in 2022, more than three times the traffic on 2017 (122 exabytes per month). In this work, we research the similarities between vehicular networks and computer networks. These similarities will allow us to implement the Contraction Hierarchies algorithm (CH) in computer networks. CH is an interdisciplinary algorithm from vehicular networks which can provide us with the elements and logic to optimize specific routing problems in computer networks. In order to implement CH, we use Software Defined Networks (SDN). SDN is a computer networks paradigm that separates the Data and Control planes. The Data plane is left to the network devices to distribute the packages, and the control plane is centralized into a Controller. By having a controller with a broad view of the network, we implement CH in order to optimize route selection. Once the route is determined, we study the possibility of using the advantages of CH to redistribute traffic in case the network elements suffer from unforeseen circumstances.Master of Science in Applied Computer Scienc
Managing Smartphone Testbeds with SmartLab
The explosive number of smartphones with ever growing sensing and computing capabilities have brought a paradigm shift to many traditional domains of the computing field. Re-programming smartphones and instrumenting them for application testing and data gathering at scale is currently a tedious and time-consuming process that poses significant logistical challenges. In this paper, we make three major contributions: First, we propose a comprehensive architecture, coined SmartLab1, for managing a cluster of both real and virtual smartphones that are either wired to a private cloud or connected over a wireless link. Second, we propose and describe a number of Android management optimizations (e.g., command pipelining, screen-capturing, file management), which can be useful to the community for building similar functionality into their systems. Third, we conduct extensive experiments and microbenchmarks to support our design choices providing qualitative evidence on the expected performance of each module comprising our architecture. This paper also overviews experiences of using SmartLab in a research-oriented setting and also ongoing and future development efforts
Analyzing Data-center Application Performance Via Constraint-based Models
Hyperscale Data Centers (HDCs) are the largest distributed computing machines ever constructed. They serve as the backbone for many popular applications, such as YouTube, Netflix, Meta, and Airbnb, which involve millions of users and generate billions in revenue. As the networking infrastructure plays a pivotal role in determining the performance of HDC applications, understanding and optimizing their networking performance is critical. This thesis proposes and evaluates a constraint-based approach to characterize the networking performance of HDC applications. Through extensive evaluations conducted in both controlled settings and real-world case studies within a production HDC, I demonstrated the effectiveness of the constraint-based approach in handling the immense volume of performance data in HDCs, achieving tremendous dimension reduction, and providing very useful interpretability.Doctor of Philosoph
Data-Driven Methods for Data Center Operations Support
During the last decade, cloud technologies have been evolving at
an impressive pace, such that we are now living in a cloud-native
era where developers can leverage on an unprecedented landscape
of (possibly managed) services for orchestration, compute, storage,
load-balancing, monitoring, etc. The possibility to have on-demand
access to a diverse set of configurable virtualized resources allows
for building more elastic, flexible and highly-resilient distributed
applications. Behind the scenes, cloud providers sustain the heavy
burden of maintaining the underlying infrastructures, consisting in
large-scale distributed systems, partitioned and replicated among
many geographically dislocated data centers to guarantee scalability,
robustness to failures, high availability and low latency. The larger the
scale, the more cloud providers have to deal with complex interactions
among the various components, such that monitoring, diagnosing and
troubleshooting issues become incredibly daunting tasks.
To keep up with these challenges, development and operations
practices have undergone significant transformations, especially in
terms of improving the automations that make releasing new software,
and responding to unforeseen issues, faster and sustainable at scale.
The resulting paradigm is nowadays referred to as DevOps. However,
while such automations can be very sophisticated, traditional DevOps
practices fundamentally rely on reactive mechanisms, that typically
require careful manual tuning and supervision from human experts.
To minimize the risk of outages—and the related costs—it is crucial to
provide DevOps teams with suitable tools that can enable a proactive
approach to data center operations.
This work presents a comprehensive data-driven framework to address
the most relevant problems that can be experienced in large-scale
distributed cloud infrastructures. These environments are indeed characterized
by a very large availability of diverse data, collected at each
level of the stack, such as: time-series (e.g., physical host measurements,
virtual machine or container metrics, networking components
logs, application KPIs); graphs (e.g., network topologies, fault graphs
reporting dependencies among hardware and software components,
performance issues propagation networks); and text (e.g., source code,
system logs, version control system history, code review feedbacks).
Such data are also typically updated with relatively high frequency,
and subject to distribution drifts caused by continuous configuration
changes to the underlying infrastructure. In such a highly dynamic scenario,
traditional model-driven approaches alone may be inadequate
at capturing the complexity of the interactions among system components. DevOps teams would certainly benefit from having robust
data-driven methods to support their decisions based on historical
information. For instance, effective anomaly detection capabilities may
also help in conducting more precise and efficient root-cause analysis.
Also, leveraging on accurate forecasting and intelligent control
strategies would improve resource management.
Given their ability to deal with high-dimensional, complex data,
Deep Learning-based methods are the most straightforward option for
the realization of the aforementioned support tools. On the other hand,
because of their complexity, this kind of models often requires huge
processing power, and suitable hardware, to be operated effectively
at scale. These aspects must be carefully addressed when applying
such methods in the context of data center operations. Automated
operations approaches must be dependable and cost-efficient, not to
degrade the services they are built to improve.
i
Benchmarking Eventually Consistent Distributed Storage Systems
Cloud storage services and NoSQL systems typically offer only "Eventual Consistency", a rather weak guarantee covering a broad range of potential data consistency behavior. The degree of actual (in-)consistency, however, is unknown. This work presents novel solutions for determining the degree of (in-)consistency via simulation and benchmarking, as well as the necessary means to resolve inconsistencies leveraging this information
- …