385 research outputs found

    Middleware-based Database Replication: The Gaps between Theory and Practice

    Get PDF
    The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. This paper aims to characterize the gap along three axes: performance, availability, and administration. We build on our own experience developing and deploying replication systems in commercial and academic settings, as well as on a large body of prior related work. We sift through representative examples from the last decade of open-source, academic, and commercial database replication systems and combine this material with case studies from real systems deployed at Fortune 500 customers. We propose two agendas, one for academic research and one for industrial R&D, which we believe can bridge the gap within 5-10 years. This way, we hope to both motivate and help researchers in making the theory and practice of middleware-based database replication more relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, June 200

    System Support For Stream Processing In Collaborative Cloud-Edge Environment

    Get PDF
    Stream processing is a critical technique to process huge amount of data in real-time manner. Cloud computing has been used for stream processing due to its unlimited computation resources. At the same time, we are entering the era of Internet of Everything (IoE). The emerging edge computing benefits low-latency applications by leveraging computation resources at the proximity of data sources. Billions of sensors and actuators are being deployed worldwide and huge amount of data generated by things are immersed in our daily life. It has become essential for organizations to be able to stream and analyze data, and provide low-latency analytics on streaming data. However, cloud computing is inefficient to process all data in a centralized environment in terms of the network bandwidth cost and response latency. Although edge computing offloads computation from the cloud to the edge of the Internet, there is not a data sharing and processing framework that efficiently utilizes computation resources in the cloud and the edge. Furthermore, the heterogeneity of edge devices brings more difficulty to the development of collaborative cloud-edge applications. To explore and attack the challenges of stream processing system in collaborative cloudedge environment, in this dissertation we design and develop a series of systems to support stream processing applications in hybrid cloud-edge analytics. Specifically, we develop an hierarchical and hybrid outlier detection model for multivariate time series streams that automatically selects the best model for different time series. We optimize one of the stream processing system (i.e., Spark Streaming) to reduce the end-to-end latency. To facilitate the development of collaborative cloud-edge applications, we propose and implement a new computing framework, Firework that allows stakeholders to share and process data by leveraging both the cloud and the edge. A vision-based cloud-edge application is implemented to demonstrate the capabilities of Firework. By combining all these studies, we provide comprehensive system support for stream processing in collaborative cloud-edge environment

    Identifying and diagnosing video streaming performance issues

    Get PDF
    On-line video streaming is an ever evolving ecosystem of services and technologies, where content providers are on a constant race to satisfy the users' demand for richer content and higher bitrate streams, updated set of features and cross-platform compatibility. At the same time, network operators are required to ensure that the requested video streams are delivered through the network with a satisfactory quality in accordance with the existing Service Level Agreements (SLA). However, tracking and maintaining satisfactory video Quality of Experience (QoE) has become a greater challenge for operators than ever before. With the growing popularity of content engagement on handheld devices and over wireless connections, new points-of-failure have added to the list of failures that can affect the video quality. Moreover, the adoption of end-to-end encryption by major streaming services has rendered previously used QoE diagnosis methods obsolete. In this thesis, we identify the current challenges in identifying and diagnosing video streaming issues and we propose novel approaches in order to address them. More specifically, the thesis initially presents methods and tools to identify a wide array of QoE problems and the severity with which they affect the users' experience. The next part of the thesis deals with the investigation of methods to locate under-performing parts of the network that lead to drop of the delivered quality of a service. In this context, we propose a data-driven methodology for detecting the under performing areas of cellular network with sub-optimal Quality of Service (QoS) and video QoE. Moreover, we develop and evaluate a multi-vantage point framework that is capable of diagnosing the underlying faults that cause the disruption of the user's experience. The last part of this work, further explores the detection of network performance anomalies and introduces a novel method for detecting such issues using contextual information. This approach provides higher accuracy when detecting network faults in the presence of high variation and can benefit providers to perform early detection of anomalies before they result in QoE issues.La distribución de vídeo online es un ecosistema de servicios y tecnologías, donde los proveedores de contenidos se encuentran en una carrera continua para satisfacer las demandas crecientes de los usuarios de más riqueza de contenido, velocidad de transmisión, funcionalidad y compatibilidad entre diferentes plataformas. Asimismo, los operadores de red deben asegurar que los contenidos demandados son entregados a través de la red con una calidad satisfactoria según los acuerdos existentes de nivel de servicio (en inglés Service Level Agreement o SLA). Sin embargo, la monitorización y el mantenimiento de un nivel satisfactorio de la calidad de experiencia (en inglés Quality of Experience o QoE) del vídeo online se ha convertido en un reto mayor que nunca para los operadores. Dada la creciente popularidad del consumo de contenido con dispositivos móviles y a través de redes inalámbricas, han aparecido nuevos puntos de fallo que se han añadido a la lista de problemas que pueden afectar a la calidad del vídeo transmitido. Adicionalmente, la adopción de sistemas de encriptación extremo a extremo, por parte de los servicios más importantes de distribución de vídeo online, ha dejado obsoletos los métodos existentes de diagnóstico de la QoE. En esta tesis se identifican los retos actuales en la identificación y diagnóstico de los problemas de transmisión de vídeo online, y se proponen nuevas soluciones para abordar estos problemas. Más concretamente, inicialmente la tesis presenta métodos y herramientas para identificar un conjunto amplio de problemas de QoE y la severidad con los que estos afectan a la experiencia de los usuarios. La siguiente parte de la tesis investiga métodos para localizar partes de la red con un rendimiento bajo que resultan en una disminución de la calidad del servicio ofrecido. En este contexto, se propone una metodología basada en el análisis de datos para detectar áreas de la red móvil que ofrecen un nivel subóptimo de calidad de servicio (en inglés Quality of Service o QoS) y QoE. Además, se desarrolla y se evalúa una solución basada en múltiples puntos de medida que es capaz de diagnosticar los problemas subyacentes que causan la alteración de la experiencia de usuario. La última parte de este trabajo explora adicionalmente la detección de anomalías de rendimiento de la red y presenta un nuevo método para detectar estas situaciones utilizando información contextual. Este enfoque proporciona una mayor precisión en la detección de fallos de la red en presencia de alta variabilidad y puede ayudar a los proveedores a la detección precoz de anomalías antes de que se conviertan en problemas de QoE.La distribució de vídeo online és un ecosistema de serveis i tecnologies, on els proveïdors de continguts es troben en una cursa continua per satisfer les demandes creixents del usuaris de més riquesa de contingut, velocitat de transmissió, funcionalitat i compatibilitat entre diferents plataformes. A la vegada, els operadors de xarxa han d’assegurar que els continguts demandats són entregats a través de la xarxa amb una qualitat satisfactòria segons els acords existents de nivell de servei (en anglès Service Level Agreement o SLA). Tanmateix, el monitoratge i el manteniment d’un nivell satisfactori de la qualitat d’experiència (en anglès Quality of Experience o QoE) del vídeo online ha esdevingut un repte més gran que mai per als operadors. Donada la creixent popularitat del consum de contingut amb dispositius mòbils i a través de xarxes sense fils, han aparegut nous punts de fallada que s’han afegit a la llista de problemes que poden afectar a la qualitat del vídeo transmès. Addicionalment, l’adopció de sistemes d’encriptació extrem a extrem, per part dels serveis més importants de distribució de vídeo online, ha deixat obsolets els mètodes existents de diagnòstic de la QoE. En aquesta tesi s’identifiquen els reptes actuals en la identificació i diagnòstic dels problemes de transmissió de vídeo online, i es proposen noves solucions per abordar aquests problemes. Més concretament, inicialment la tesi presenta mètodes i eines per identificar un conjunt ampli de problemes de QoE i la severitat amb la que aquests afecten a la experiència dels usuaris. La següent part de la tesi investiga mètodes per localitzar parts de la xarxa amb un rendiment baix que resulten en una disminució de la qualitat del servei ofert. En aquest context es proposa una metodologia basada en l’anàlisi de dades per detectar àrees de la xarxa mòbil que ofereixen un nivell subòptim de qualitat de servei (en anglès Quality of Service o QoS) i QoE. A més, es desenvolupa i s’avalua una solució basada en múltiples punts de mesura que és capaç de diagnosticar els problemes subjacents que causen l’alteració de l’experiència d’usuari. L’última part d’aquest treball explora addicionalment la detecció d’anomalies de rendiment de la xarxa i presenta un nou mètode per detectar aquestes situacions utilitzant informació contextual. Aquest enfoc proporciona una major precisió en la detecció de fallades de la xarxa en presencia d’alta variabilitat i pot ajudar als proveïdors a la detecció precoç d’anomalies abans de que es converteixin en problemes de QoE.Postprint (published version

    Re-routing using Contraction Hierarchies in Software-Defined Networks

    Get PDF
    According to the Open Networking Foundation (ONF), one of the reasons to reexamine traditional network architectures is the increment of mobile devices and its data transmission. The global IP traffic forecast by CISCO estimates an overall traffic increase to 396 exabytes per month in 2022, more than three times the traffic on 2017 (122 exabytes per month). In this work, we research the similarities between vehicular networks and computer networks. These similarities will allow us to implement the Contraction Hierarchies algorithm (CH) in computer networks. CH is an interdisciplinary algorithm from vehicular networks which can provide us with the elements and logic to optimize specific routing problems in computer networks. In order to implement CH, we use Software Defined Networks (SDN). SDN is a computer networks paradigm that separates the Data and Control planes. The Data plane is left to the network devices to distribute the packages, and the control plane is centralized into a Controller. By having a controller with a broad view of the network, we implement CH in order to optimize route selection. Once the route is determined, we study the possibility of using the advantages of CH to redistribute traffic in case the network elements suffer from unforeseen circumstances.Master of Science in Applied Computer Scienc

    Managing Smartphone Testbeds with SmartLab

    Get PDF
    The explosive number of smartphones with ever growing sensing and computing capabilities have brought a paradigm shift to many traditional domains of the computing field. Re-programming smartphones and instrumenting them for application testing and data gathering at scale is currently a tedious and time-consuming process that poses significant logistical challenges. In this paper, we make three major contributions: First, we propose a comprehensive architecture, coined SmartLab1, for managing a cluster of both real and virtual smartphones that are either wired to a private cloud or connected over a wireless link. Second, we propose and describe a number of Android management optimizations (e.g., command pipelining, screen-capturing, file management), which can be useful to the community for building similar functionality into their systems. Third, we conduct extensive experiments and microbenchmarks to support our design choices providing qualitative evidence on the expected performance of each module comprising our architecture. This paper also overviews experiences of using SmartLab in a research-oriented setting and also ongoing and future development efforts

    Analyzing Data-center Application Performance Via Constraint-based Models

    Get PDF
    Hyperscale Data Centers (HDCs) are the largest distributed computing machines ever constructed. They serve as the backbone for many popular applications, such as YouTube, Netflix, Meta, and Airbnb, which involve millions of users and generate billions in revenue. As the networking infrastructure plays a pivotal role in determining the performance of HDC applications, understanding and optimizing their networking performance is critical. This thesis proposes and evaluates a constraint-based approach to characterize the networking performance of HDC applications. Through extensive evaluations conducted in both controlled settings and real-world case studies within a production HDC, I demonstrated the effectiveness of the constraint-based approach in handling the immense volume of performance data in HDCs, achieving tremendous dimension reduction, and providing very useful interpretability.Doctor of Philosoph

    Data-Driven Methods for Data Center Operations Support

    Get PDF
    During the last decade, cloud technologies have been evolving at an impressive pace, such that we are now living in a cloud-native era where developers can leverage on an unprecedented landscape of (possibly managed) services for orchestration, compute, storage, load-balancing, monitoring, etc. The possibility to have on-demand access to a diverse set of configurable virtualized resources allows for building more elastic, flexible and highly-resilient distributed applications. Behind the scenes, cloud providers sustain the heavy burden of maintaining the underlying infrastructures, consisting in large-scale distributed systems, partitioned and replicated among many geographically dislocated data centers to guarantee scalability, robustness to failures, high availability and low latency. The larger the scale, the more cloud providers have to deal with complex interactions among the various components, such that monitoring, diagnosing and troubleshooting issues become incredibly daunting tasks. To keep up with these challenges, development and operations practices have undergone significant transformations, especially in terms of improving the automations that make releasing new software, and responding to unforeseen issues, faster and sustainable at scale. The resulting paradigm is nowadays referred to as DevOps. However, while such automations can be very sophisticated, traditional DevOps practices fundamentally rely on reactive mechanisms, that typically require careful manual tuning and supervision from human experts. To minimize the risk of outages—and the related costs—it is crucial to provide DevOps teams with suitable tools that can enable a proactive approach to data center operations. This work presents a comprehensive data-driven framework to address the most relevant problems that can be experienced in large-scale distributed cloud infrastructures. These environments are indeed characterized by a very large availability of diverse data, collected at each level of the stack, such as: time-series (e.g., physical host measurements, virtual machine or container metrics, networking components logs, application KPIs); graphs (e.g., network topologies, fault graphs reporting dependencies among hardware and software components, performance issues propagation networks); and text (e.g., source code, system logs, version control system history, code review feedbacks). Such data are also typically updated with relatively high frequency, and subject to distribution drifts caused by continuous configuration changes to the underlying infrastructure. In such a highly dynamic scenario, traditional model-driven approaches alone may be inadequate at capturing the complexity of the interactions among system components. DevOps teams would certainly benefit from having robust data-driven methods to support their decisions based on historical information. For instance, effective anomaly detection capabilities may also help in conducting more precise and efficient root-cause analysis. Also, leveraging on accurate forecasting and intelligent control strategies would improve resource management. Given their ability to deal with high-dimensional, complex data, Deep Learning-based methods are the most straightforward option for the realization of the aforementioned support tools. On the other hand, because of their complexity, this kind of models often requires huge processing power, and suitable hardware, to be operated effectively at scale. These aspects must be carefully addressed when applying such methods in the context of data center operations. Automated operations approaches must be dependable and cost-efficient, not to degrade the services they are built to improve. i

    Benchmarking Eventually Consistent Distributed Storage Systems

    Get PDF
    Cloud storage services and NoSQL systems typically offer only "Eventual Consistency", a rather weak guarantee covering a broad range of potential data consistency behavior. The degree of actual (in-)consistency, however, is unknown. This work presents novel solutions for determining the degree of (in-)consistency via simulation and benchmarking, as well as the necessary means to resolve inconsistencies leveraging this information
    • …
    corecore