173 research outputs found

    Robust health stream processing

    Get PDF
    2014 Fall.Includes bibliographical references.As the cost of personal health sensors decrease along with improvements in battery life and connectivity, it becomes more feasible to allow patients to leave full-time care environments sooner. Such devices could lead to greater independence for the elderly, as well as for others who would normally require full-time care. It would also allow surgery patients to spend less time in the hospital, both pre- and post-operation, as all data could be gathered via remote sensors in the patients home. While sensor technology is rapidly approaching the point where this is a feasible option, we still lack in processing frameworks which would make such a leap not only feasible but safe. This work focuses on developing a framework which is robust to both failures of processing elements as well as interference from other computations processing health sensor data. We work with 3 disparate data streams and accompanying computations: electroencephalogram (EEG) data gathered for a brain-computer interface (BCI) application, electrocardiogram (ECG) data gathered for arrhythmia detection, and thorax data gathered from monitoring patient sleep status

    An Analysis of Distributed Systems Syllabi With a Focus on Performance-Related Topics

    Get PDF
    We analyze a dataset of 51 current (2019-2020) Distributed Systems syllabi from top Computer Science programs, focusing on finding the prevalence and context in which topics related to performance are being taught in these courses. We also study the scale of the infrastructure mentioned in DS courses, from small client-server systems to cloud-scale, peer-to-peer, global-scale systems. We make eight main findings, covering goals such as performance, and scalability and its variant elasticity; activities such as performance benchmarking and monitoring; eight selected performance-enhancing techniques (replication, caching, sharding, load balancing, scheduling, streaming, migrating, and offloading); and control issues such as trade-offs that include performance and performance variability.Comment: Accepted for publication at WEPPE 2021, to be held in conjunction with ACM/SPEC ICPE 2021: https://doi.org/10.1145/3447545.3451197 This article is a follow-up of our prior ACM SIGCSE publication, arXiv:2012.0055

    Design and evaluation of a cloud native data analysis pipeline for cyber physical production systems

    Get PDF
    Since 1991 with the birth of the World Wide Web the rate of data growth has been growing with a record level in the last couple of years. Big companies tackled down this data growth with expensive and enormous data centres to process and get value of this data. From social media, Internet of Things (IoT), new business process, monitoring and multimedia, the capacities of those data centres started to be a problem and required continuos and expensive expansion. Thus, Big Data was something that only a few were able to access. This changed fast when Amazon launched Amazon Web Services (AWS) around 15 years ago and gave the origins to the public cloud. At that time, the capabilities were still very new and reduced but 10 years later the cloud was a whole new business that changed for ever the Big Data business. This not only commoditised computer power but it was accompanied by a price model that let medium and small players the possibility to access it. In consequence, new problems arised regarding the nature of these distributed systems and the software architectures required for proper data processing. The present job analyse the type of typical Big Data workloads and propose an architecture for a cloud native data analysis pipeline. Lastly, it provides a chapter for tools and services that can be used in the architecture taking advantage of their open source nature and the cloud price models.Fil: Ferrer Daub, Facundo Javier. Universidad Católica de Córdoba. Instituto de Ciencias de la Administración; Argentin

    Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures

    Get PDF
    One of the significant shifts of the next-generation computing technologies will certainly be in the development of Big Data (BD) deployment architectures. Apache Hadoop, the BD landmark, evolved as a widely deployed BD operating system. Its new features include federation structure and many associated frameworks, which provide Hadoop 3.x with the maturity to serve different markets. This dissertation addresses two leading issues involved in exploiting BD and large-scale data analytics realm using the Hadoop platform. Namely, (i)Scalability that directly affects the system performance and overall throughput using portable Docker containers. (ii) Security that spread the adoption of data protection practices among practitioners using access controls. An Enhanced Mapreduce Environment (EME), OPportunistic and Elastic Resource Allocation (OPERA) scheduler, BD Federation Access Broker (BDFAB), and a Secure Intelligent Transportation System (SITS) of multi-tiers architecture for data streaming to the cloud computing are the main contribution of this thesis study

    Dependable mapreduce in a cloud-of-clouds

    Get PDF
    Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2017MapReduce is a simple and elegant programming model suitable for loosely coupled parallelization problems—problems that can be decomposed into subproblems. Hadoop MapReduce has become the most popular framework for performing large-scale computation on off-the-shelf clusters, and it is widely used to process these problems in a parallel and distributed fashion. This framework is highly scalable, can deal efficiently with large volumes of unstructured data, and it is a platform for many other applications. However, the framework has limitations concerning dependability. Namely, it is solely prepared to tolerate crash faults by re-executing tasks in case of failure, and to detect file corruptions using file checksums. Unfortunately, there is evidence that arbitrary faults do occur and can affect the correctness of MapReduce execution. Although such Byzantine faults are considered to be rare, particular MapReduce applications are critical and intolerant to this type of fault. Furthermore, typical MapReduce implementations are constrained to a single cloud environment. This is a problem as there is increasing evidence of outages on major cloud offerings, raising concerns about the dependence on a single cloud. In this thesis, I propose techniques to improve the dependability of MapReduce systems. The proposed solutions allow MapReduce to scale out computations to a multi-cloud environment, or cloud of-clouds, to tolerate arbitrary and malicious faults and cloud outages. The proposals have three important properties: they increase the dependability of MapReduce by tolerating the faults mentioned above; they require minimal or no modifications to users’ applications; and they achieve this increased level of fault tolerance at reasonable cost. To achieve these goals, I introduce three key ideas: minimizing the required replication; applying context-based job scheduling based on cloud and network conditions; and performing fine-grained replication. I evaluated all proposed solutions in real testbed environments running typical MapReduce applications. The results demonstrate interesting trade-offs concerning resilience and performance when compared to traditional methods. The fundamental conclusion is that the cost introduced by our solutions is small, and thus deemed acceptable for many critical applications.O MapReduce é um modelo de programação adequado para processar grandes volumes de dados em paralelo, executando um conjunto de tarefas independentes, e combinando os resultados parciais na solução final. OHadoop MapReduce é uma plataforma popular para processar grandes quantidades de dados de forma paralela e distribuída. Do ponto de vista da confiabilidade, a plataforma está preparada exclusivamente para tolerar faltas de paragem, re-executando tarefas, e detectar corrupções de ficheiros usando somas de verificação. Esta é uma importante limitação dado haver evidência de que faltas arbitrárias ocorrem e podem afetar a execução do MapReduce. Embora estas faltas Bizantinas sejam raras, certas aplicações de MapReduce são críticas e não toleram faltas deste tipo. Além disso, o número de ocorrências de interrupções em infraestruturas da nuvem tem vindo a aumentar ao longo dos anos, levantando preocupações sobre a dependência dos clientes num fornecedor único de serviços de nuvem. Nesta tese proponho várias técnicas para melhorar a confiabilidade do sistema MapReduce. As soluções propostas permitem processar tarefas MapReduce num ambiente de múltiplas nuvens para tolerar faltas arbitrárias, maliciosas e faltas de paragem nas nuvens. Estas soluções oferecem três importantes propriedades: toleram os tipos de faltas mencionadas; não exigem modificações às aplicações dos clientes; alcançam esta tolerância a faltas a um custo razoável. Estas técnicas são baseadas nas seguintes ideias: minimizar a replicação, desenvolver algoritmos de escalonamento para o MapReduce baseados nas condições da nuvem e da rede, e criar um sistema de tolerância a faltas com granularidade fina no que respeita à replicação. Avaliei as minhas propostas em ambientes de teste real com aplicações comuns do MapReduce, que me permite demonstrar compromissos interessantes em termos de resiliência e desempenho, quando comparados com métodos tradicionais. Em particular, os resultados mostram que o custo introduzido pelas soluções são aceitáveis para muitas aplicações críticas
    corecore