3,338 research outputs found

    Foggy clouds and cloudy fogs: a real need for coordinated management of fog-to-cloud computing systems

    Get PDF
    The recent advances in cloud services technology are fueling a plethora of information technology innovation, including networking, storage, and computing. Today, various flavors have evolved of IoT, cloud computing, and so-called fog computing, a concept referring to capabilities of edge devices and users' clients to compute, store, and exchange data among each other and with the cloud. Although the rapid pace of this evolution was not easily foreseeable, today each piece of it facilitates and enables the deployment of what we commonly refer to as a smart scenario, including smart cities, smart transportation, and smart homes. As most current cloud, fog, and network services run simultaneously in each scenario, we observe that we are at the dawn of what may be the next big step in the cloud computing and networking evolution, whereby services might be executed at the network edge, both in parallel and in a coordinated fashion, as well as supported by the unstoppable technology evolution. As edge devices become richer in functionality and smarter, embedding capacities such as storage or processing, as well as new functionalities, such as decision making, data collection, forwarding, and sharing, a real need is emerging for coordinated management of fog-to-cloud (F2C) computing systems. This article introduces a layered F2C architecture, its benefits and strengths, as well as the arising open and research challenges, making the case for the real need for their coordinated management. Our architecture, the illustrative use case presented, and a comparative performance analysis, albeit conceptual, all clearly show the way forward toward a new IoT scenario with a set of existing and unforeseen services provided on highly distributed and dynamic compute, storage, and networking resources, bringing together heterogeneous and commodity edge devices, emerging fogs, as well as conventional clouds.Peer ReviewedPostprint (author's final draft

    Performance Models of Data Parallel DAG Workflows for Large Scale Data Analytics

    Get PDF
    Directed Acyclic Graph (DAG) workflows are widely used for large-scale data analytics in cluster-based distributed computing systems. Building an accurate performance model for a DAG on data-parallel frameworks (e.g., MapReduce) is critical to implement autonomic self-management big data systems. An accurate performance model is challenging because the allocation of pre-emptable system resources among parallel jobs may dynamically vary during execution. This resource allocation variation during execution makes it difficult to accurately estimate the execution time. In this paper, we tackle this challenge by proposing a new cost model, called Bottleneck Oriented Estimation (BOE), to estimate the allocation of preemptable resources by identifying the bottleneck to accurately predict task execution time. For a DAG workflow, we propose a state-based approach to iteratively use the resource allocation property among stages to estimate the overall execution plan. Extensive experiments were performed to validate these cost models with HiBench and TPC-H workloads. The BOE model outperforms the state-of-the-art models by a factor of five for task execution time estimation.Peer reviewe

    Computing at massive scale: Scalability and dependability challenges

    Get PDF
    Large-scale Cloud systems and big data analytics frameworks are now widely used for practical services and applications. However, with the increase of data volume, together with the heterogeneity of workloads and resources, and the dynamic nature of massive user requests, the uncertainties and complexity of resource management and service provisioning increase dramatically, often resulting in poor resource utilization, vulnerable system dependability, and user-perceived performance degradations. In this paper we report our latest understanding of the current and future challenges in this particular area, and discuss both existing and potential solutions to the problems, especially those concerned with system efficiency, scalability and dependability. We first introduce a data-driven analysis methodology for characterizing the resource and workload patterns and tracing performance bottlenecks in a massive-scale distributed computing environment. We then examine and analyze several fundamental challenges and the solutions we are developing to tackle them, including for example incremental but decentralized resource scheduling, incremental messaging communication, rapid system failover, and request handling parallelism. We integrate these solutions with our data analysis methodology in order to establish an engineering approach that facilitates the optimization, tuning and verification of massive-scale distributed systems. We aim to develop and offer innovative methods and mechanisms for future computing platforms that will provide strong support for new big data and IoE (Internet of Everything) applications

    Orchestration of machine learning workflows on Internet of Things data

    Get PDF
    Applications empowered by machine learning (ML) and the Internet of Things (IoT) are changing the way people live and impacting a broad range of industries. However, creating and automating ML workflows at scale using real-world IoT data often leads to complex systems integration and production issues. Examples of challenges faced during the development of these ML applications include glue code, hidden dependencies, and data pipeline jungles. This research proposes the Machine Learning Framework for IoT data (ML4IoT), which is designed to orchestrate ML workflows to perform training and enable inference by ML models on IoT data. In the proposed framework, containerized microservices are used to automate the execution of tasks specified in ML workflows, which are defined through REST APIs. To address the problem of integrating big data tools and machine learning into a unified platform, the proposed framework enables the definition and execution of end-to-end ML workflows on large volumes of IoT data. In addition, to address the challenges of running multiple ML workflows in parallel, the ML4IoT has been designed to use container-based components that provide a convenient mechanism to enable the training and deployment of numerous ML models in parallel. Finally, to address the common production issues faced during the development of ML applications, the proposed framework used microservices architecture to bring flexibility, reusability, and extensibility to the framework. Through the experiments, we demonstrated the feasibility of the (ML4IoT), which managed to train and deploy predictive ML models in two types of IoT data. The obtained results suggested that the proposed framework can manage real-world IoT data, by providing elasticity to execute 32 ML workflows in parallel, which were used to train 128 ML models simultaneously. Also, results demonstrated that in the ML4IoT, the performance of rendering online predictions is not affected when 64 ML models are deployed concurrently to infer new information using online IoT data
    • …
    corecore