    It's about THYME: On the design and implementation of a time-aware reactive storage system for pervasive edge computing environments

    Scrub: Online TroubleShooting for Large Mission-Critical Applications

    Scrub is a troubleshooting tool for distributed applications that operate under strict SLOs common in production environments. It allows users to formulate queries on events occurring during execution in order to assess the correctness of the application’s operation. Scrub has been in use for two years at Turn, where developers and users have relied on it to resolve numerous issues in its online advertisement bidding platform. This platform spans thousands of machines across the globe, serving several million bid requests per second, and dispensing many millions of dollars in advertising budgets. Troubleshooting distributed applications is notoriously hard, and its difficulty is exacerbated by the presence of strict SLOs, which requires the troubleshooting tool to have only minimal impact on the hosts running the application. Furthermore, with large amounts of money at stake, users expect to be able to run frequent diagnostics and demand quick evaluation and remediation of any problems. These constraints have led to a number of design and implementation decisions, that go counter to conventional wisdom. In particular, Scrub supports only a restricted form of joins. Its query execution strategy eschews imposing any overhead on the application hosts. In particular, joins, group-by operations and aggregations are sent to a dedicated centralized facility. In terms of implementation, Scrub avoids the overhead and security concerns of dynamic instrumentation. Finally, at all levels of the system, accuracy is traded for minimal impact on the hosts. We present the design and implementation of Scrub and contrast its choices to those made in earlier systems. We illustrate its power by describing a number of use cases, and we demonstrate its negligible overhead on the underlying application. On average, we observe a maximum CPU overhead of up to 2.5% on application hosts and a 1% increase in request latency. These overheads allow the advertisement bidding platform to operate well within its SLOs

    Storage and Ingestion Systems in Support of Stream Processing: A Survey

    Under the pressure of massive, exponentially increasing amounts ofheterogeneous data that are generated faster and faster, Big Data analyticsapplications have seen a shift from batch processing to stream processing,which can reduce the time needed to obtain meaningful insight dramatically.Stream processing is particularly well suited to address the challenges of fog/edgecomputing: much of this massive data comes from Internet of Things (IoT)devices and needs to be continuously funneled through an edge infrastructuretowards centralized clouds. Thus, it is only natural to process data on theirway as much as possible rather than wait for streams to accumulate on thecloud. Unfortunately, state-of-the-art stream processing systems are not wellsuited for this role: the data are accumulated (ingested), processed andpersisted (stored) separately, often using different services hosted ondifferent physical machines/clusters. Furthermore, there is only limited support foradvanced data manipulations, which often forces application developers tointroduce custom solutions and workarounds. In this survey article, wecharacterize the main state-of-the-art stream storage and ingestion systems.We identify the key aspects and discuss limitations and missing features inthe context of stream processing for fog/edge and cloud computing. The goal is tohelp practitioners understand and prepare for potential bottlenecks when usingsuch state-of-the-art systems. In particular, we discuss both functional(partitioning, metadata, search support, message routing, backpressuresupport) and non-functional aspects (high availability, durability,scalability, latency vs. throughput). As a conclusion of our study, weadvocate for a unified stream storage and ingestion system to speed-up datamanagement and reduce I/O redundancy (both in terms of storage space andnetwork utilization)

    RAMP: RDMA Migration Platform

    Remote Direct Memory Access (RDMA) can be used to implement a shared storage abstraction or a shared-nothing abstraction for distributed applications. We argue that the shared storage abstraction is overkill for loosely coupled applications and that the shared-nothing abstraction does not leverage all the benefits of RDMA. In this thesis, we propose an alternative abstraction for such applications using a shared-on-demand architecture, and present the RDMA Migration Platform (RAMP). RAMP is a lightweight coordination service for building loosely coupled distributed applications. This thesis describes the RAMP system, its programming model and operations, and evaluates the performance of RAMP using microbenchmarks. Furthermore, we illustrate RAMPs load balancing capabilities with a case study of a loosely coupled application that uses RAMP to balance a partition skew under load

    Data stream processing meets the Advanced Metering Infrastructure: possibilities, challenges and applications

    Distribution of electricity is changing.Energy production is increasingly distributed, weather dependent and located in the distribution network, close to consumers.Energy consumption is increasing throughout society and the electrification of transportation is driving distribution networks closer to the limits.Operating the networks closer to their limits also increases the risk for faults.Continuous monitoring of the distribution network closest to the customers is needed in order to mitigate this risk.The Advanced Metering Infrastructure introduced smart meters throughout the distribution network.Data stream processing is a computing paradigm that offers low latency results from analysis on large volumes of the data.This thesis investigates the possibilities and challenges for continuous monitoring that are created when the Advanced Metering Infrastructure and data stream processing meet.The challenges that are addressed in the thesis are efficient processing of unordered (also called out-of-order) data and efficient usage of the computational resources present in the Advanced Metering Infrastructure.Contributions towards more efficient processing of out-of-order data are made with eChIDNA and TinTiN. Both are systems that utilize knowledge about smart meter data to directly produce results where possible and storing only data that is relevant for late data in order to produce updated results when such late data arrives. eChIDNA is integrated in the streaming query itself, while TinTiN is a streaming middleware that can be applied to streaming queries in order to make them resilient against out-of-order data.Eventual determinism is defined in order to formally investigate the deterministic properties of output produced by such systems.Contributions towards efficient usage of the computational resources of the Advanced Metering Infrastructure are made with the application LoCoVolt.LoCoVolt implements a monitoring algorithm that can run on equipment that is localized in the communication infrastructure of the Advanced Metering Infrastructure and can take advantage of the overlap between the communication and distribution networks.All contributions are evaluated on hardware that is available in current AMI systems, using large scale data obtained from a real production AMI

    Data Storage and Dissemination in Pervasive Edge Computing Environments

    Nowadays, smart mobile devices generate huge amounts of data in all sorts of gatherings. Much of that data has localized and ephemeral interest, but can be of great use if shared among co-located devices. However, mobile devices often experience poor connectivity, leading to availability issues if application storage and logic are fully delegated to a remote cloud infrastructure. In turn, the edge computing paradigm pushes computations and storage beyond the data center, closer to end-user devices where data is generated and consumed. Hence, enabling the execution of certain components of edge-enabled systems directly and cooperatively on edge devices. This thesis focuses on the design and evaluation of resilient and efficient data storage and dissemination solutions for pervasive edge computing environments, operating with or without access to the network infrastructure. In line with this dichotomy, our goal can be divided into two specific scenarios. The first one is related to the absence of network infrastructure and the provision of a transient data storage and dissemination system for networks of co-located mobile devices. The second one relates with the existence of network infrastructure access and the corresponding edge computing capabilities. First, the thesis presents time-aware reactive storage (TARS), a reactive data storage and dissemination model with intrinsic time-awareness, that exploits synergies between the storage substrate and the publish/subscribe paradigm, and allows queries within a specific time scope. Next, it describes in more detail: i) Thyme, a data storage and dis- semination system for wireless edge environments, implementing TARS; ii) Parsley, a flexible and resilient group-based distributed hash table with preemptive peer relocation and a dynamic data sharding mechanism; and iii) Thyme GardenBed, a framework for data storage and dissemination across multi-region edge networks, that makes use of both device-to-device and edge interactions. The developed solutions present low overheads, while providing adequate response times for interactive usage and low energy consumption, proving to be practical in a variety of situations. They also display good load balancing and fault tolerance properties.Resumo Hoje em dia, os dispositivos móveis inteligentes geram grandes quantidades de dados em todos os tipos de aglomerações de pessoas. Muitos desses dados têm interesse loca- lizado e efêmero, mas podem ser de grande utilidade se partilhados entre dispositivos co-localizados. No entanto, os dispositivos móveis muitas vezes experienciam fraca co- nectividade, levando a problemas de disponibilidade se o armazenamento e a lógica das aplicações forem totalmente delegados numa infraestrutura remota na nuvem. Por sua vez, o paradigma de computação na periferia da rede leva as computações e o armazena- mento para além dos centros de dados, para mais perto dos dispositivos dos utilizadores finais onde os dados são gerados e consumidos. Assim, permitindo a execução de certos componentes de sistemas direta e cooperativamente em dispositivos na periferia da rede. Esta tese foca-se no desenho e avaliação de soluções resilientes e eficientes para arma- zenamento e disseminação de dados em ambientes pervasivos de computação na periferia da rede, operando com ou sem acesso à infraestrutura de rede. Em linha com esta dico- tomia, o nosso objetivo pode ser dividido em dois cenários específicos. O primeiro está relacionado com a ausência de infraestrutura de rede e o fornecimento de um sistema efêmero de armazenamento e disseminação de dados para redes de dispositivos móveis co-localizados. O segundo diz respeito à existência de acesso à infraestrutura de rede e aos recursos de computação na periferia da rede correspondentes. Primeiramente, a tese apresenta armazenamento reativo ciente do tempo (ARCT), um modelo reativo de armazenamento e disseminação de dados com percepção intrínseca do tempo, que explora sinergias entre o substrato de armazenamento e o paradigma pu- blicação/subscrição, e permite consultas num escopo de tempo específico. De seguida, descreve em mais detalhe: i) Thyme, um sistema de armazenamento e disseminação de dados para ambientes sem fios na periferia da rede, que implementa ARCT; ii) Pars- ley, uma tabela de dispersão distribuída flexível e resiliente baseada em grupos, com realocação preventiva de nós e um mecanismo de particionamento dinâmico de dados; e iii) Thyme GardenBed, um sistema para armazenamento e disseminação de dados em redes multi-regionais na periferia da rede, que faz uso de interações entre dispositivos e com a periferia da rede. As soluções desenvolvidas apresentam baixos custos, proporcionando tempos de res- posta adequados para uso interativo e baixo consumo de energia, demonstrando serem práticas nas mais diversas situações. Estas soluções também exibem boas propriedades de balanceamento de carga e tolerância a faltas