1,031 research outputs found

    Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY

    Get PDF
    Convergence between high-performance computing (HPC) and big data analytics (BDA) is currently an established research area that has spawned new opportunities for unifying the platform layer and data abstractions in these ecosystems. This work presents an architectural model that enables the interoperability of established BDA and HPC execution models, reflecting the key design features that interest both the HPC and BDA communities, and including an abstract data collection and operational model that generates a unified interface for hybrid applications. This architecture can be implemented in different ways depending on the process- and data-centric platforms of choice and the mechanisms put in place to effectively meet the requirements of the architecture. The Spark-DIY platform is introduced in the paper as a prototype implementation of the architecture proposed. It preserves the interfaces and execution environment of the popular BDA platform Apache Spark, making it compatible with any Spark-based application and tool, while providing efficient communication and kernel execution via DIY, a powerful communication pattern library built on top of MPI. Later, Spark-DIY is analyzed in terms of performance by building a representative use case from the hydrogeology domain, EnKF-HGS. This application is a clear example of how current HPC simulations are evolving toward hybrid HPC-BDA applications, integrating HPC simulations within a BDA environment.This work was supported in part by the Spanish Ministry of Economy, Industry and Competitiveness under Grant TIN2016-79637-P(toward Unification of HPC and Big Data Paradigms), in part by the Spanish Ministry of Education under Grant FPU15/00422 TrainingProgram for Academic and Teaching Staff Grant, in part by the Advanced Scientific Computing Research, Office of Science, U.S.Department of Energy, under Contract DE-AC02-06CH11357, and in part by the DOE with under Agreement DE-DC000122495,Program Manager Laura Biven

    Trustworthy Knowledge Planes For Federated Distributed Systems

    Full text link
    In federated distributed systems, such as the Internet and the public cloud, the constituent systems can differ in their configuration and provisioning, resulting in significant impacts on the performance, robustness, and security of applications. Yet these systems lack support for distinguishing such characteristics, resulting in uninformed service selection and poor inter-operator coordination. This thesis presents the design and implementation of a trustworthy knowledge plane that can determine such characteristics about autonomous networks on the Internet. A knowledge plane collects the state of network devices and participants. Using this state, applications infer whether a network possesses some characteristic of interest. The knowledge plane uses attestation to attribute state descriptions to the principals that generated them, thereby making the results of inference more trustworthy. Trustworthy knowledge planes enable applications to establish stronger assumptions about their network operating environment, resulting in improved robustness and reduced deployment barriers. We have prototyped the knowledge plane and associated devices. Experience with deploying analyses over production networks demonstrate that knowledge planes impose low cost and can scale to support Internet-scale networks

    Prism: Revealing Hidden Functional Clusters from Massive Instances in Cloud Systems

    Full text link
    Ensuring the reliability of cloud systems is critical for both cloud vendors and customers. Cloud systems often rely on virtualization techniques to create instances of hardware resources, such as virtual machines. However, virtualization hinders the observability of cloud systems, making it challenging to diagnose platform-level issues. To improve system observability, we propose to infer functional clusters of instances, i.e., groups of instances having similar functionalities. We first conduct a pilot study on a large-scale cloud system, i.e., Huawei Cloud, demonstrating that instances having similar functionalities share similar communication and resource usage patterns. Motivated by these findings, we formulate the identification of functional clusters as a clustering problem and propose a non-intrusive solution called Prism. Prism adopts a coarse-to-fine clustering strategy. It first partitions instances into coarse-grained chunks based on communication patterns. Within each chunk, Prism further groups instances with similar resource usage patterns to produce fine-grained functional clusters. Such a design reduces noises in the data and allows Prism to process massive instances efficiently. We evaluate Prism on two datasets collected from the real-world production environment of Huawei Cloud. Our experiments show that Prism achieves a v-measure of ~0.95, surpassing existing state-of-the-art solutions. Additionally, we illustrate the integration of Prism within monitoring systems for enhanced cloud reliability through two real-world use cases.Comment: The paper was accepted by the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023

    Trustworthy Knowledge Planes For Federated Distributed Systems

    Full text link
    In federated distributed systems, such as the Internet and the public cloud, the constituent systems can differ in their configuration and provisioning, resulting in significant impacts on the performance, robustness, and security of applications. Yet these systems lack support for distinguishing such characteristics, resulting in uninformed service selection and poor inter-operator coordination. This thesis presents the design and implementation of a trustworthy knowledge plane that can determine such characteristics about autonomous networks on the Internet. A knowledge plane collects the state of network devices and participants. Using this state, applications infer whether a network possesses some characteristic of interest. The knowledge plane uses attestation to attribute state descriptions to the principals that generated them, thereby making the results of inference more trustworthy. Trustworthy knowledge planes enable applications to establish stronger assumptions about their network operating environment, resulting in improved robustness and reduced deployment barriers. We have prototyped the knowledge plane and associated devices. Experience with deploying analyses over production networks demonstrate that knowledge planes impose low cost and can scale to support Internet-scale networks

    New Waves of IoT Technologies Research – Transcending Intelligence and Senses at the Edge to Create Multi Experience Environments

    Get PDF
    The next wave of Internet of Things (IoT) and Industrial Internet of Things (IIoT) brings new technological developments that incorporate radical advances in Artificial Intelligence (AI), edge computing processing, new sensing capabilities, more security protection and autonomous functions accelerating progress towards the ability for IoT systems to self-develop, self-maintain and self-optimise. The emergence of hyper autonomous IoT applications with enhanced sensing, distributed intelligence, edge processing and connectivity, combined with human augmentation, has the potential to power the transformation and optimisation of industrial sectors and to change the innovation landscape. This chapter is reviewing the most recent advances in the next wave of the IoT by looking not only at the technology enabling the IoT but also at the platforms and smart data aspects that will bring intelligence, sustainability, dependability, autonomy, and will support human-centric solutions.acceptedVersio

    Universal Mobile Service Execution Framework for Device-To-Device Collaborations

    Get PDF
    There are high demands of effective and high-performance of collaborations between mobile devices in the places where traditional Internet connections are unavailable, unreliable, or significantly overburdened, such as on a battlefield, disaster zones, isolated rural areas, or crowded public venues. To enable collaboration among the devices in opportunistic networks, code offloading and Remote Method Invocation are the two major mechanisms to ensure code portions of applications are successfully transmitted to and executed on the remote platforms. Although these domains are highly enjoyed in research for a decade, the limitations of multi-device connectivity, system error handling or cross platform compatibility prohibit these technologies from being broadly applied in the mobile industry. To address the above problems, we designed and developed UMSEF - an Universal Mobile Service Execution Framework, which is an innovative and radical approach for mobile computing in opportunistic networks. Our solution is built as a component-based mobile middleware architecture that is flexible and adaptive with multiple network topologies, tolerant for network errors and compatible for multiple platforms. We provided an effective algorithm to estimate the resource availability of a device for higher performance and energy consumption and a novel platform for mobile remote method invocation based on declarative annotations over multi-group device networks. The experiments in reality exposes our approach not only achieve the better performance and energy consumption, but can be extended to large-scaled ubiquitous or IoT systems

    Monitoring in Hybrid Cloud-Edge Environments

    Get PDF
    The increasing number of mobile and IoT(Internet of Things) devices accessing cloud services contributes to a surge of requests towards the Cloud and consequently, higher latencies. This is aggravated by the possible congestion of the communication networks connecting the end devices and remote cloud datacenters, due to the large data volume generated at the Edge (e.g. in the domains of smart cities, smart cars, etc.). One solution for this problem is the creation of hybrid Cloud/Edge execution platforms composed of computational nodes located in the periphery of the system, near data producers and consumers, as a way to complement the cloud resources. These edge nodes offer computation and data storage resources to accommodate local services in order to ensure rapid responses to clients (enhancing the perceived quality of service) and to filter data, reducing the traffic volume towards the Cloud. Usually these nodes (e.g. ISP access points and onpremises servers) are heterogeneous, geographically distributed, and resource-restricted (including in communication networks), which increase their management’s complexity. At the application level, the microservices paradigm, represented by applications composed of small, loosely coupled services, offers an adequate and flexible solution to design applications that may explore the limited computational resources in the Edge. Nevertheless, the inherent difficult management of microservices within such complex infrastructure demands an agile and lightweight monitoring system that takes into account the Edge’s limitations, which goes behind traditional monitoring solutions at the Cloud. Monitoring in these new domains is not a simple process since it requires supporting the elasticity of the monitored system, the dynamic deployment of services and, moreover, doing so without overloading the infrastructure’s resources with its own computational requirements and generated data. Towards this goal, this dissertation presents an hybrid monitoring architecture where the heavier (resource-wise) components reside in the Cloud while the lighter (computationally less demanding) components reside in the Edge. The architecture provides relevant monitoring functionalities such as metrics’ acquisition, their analysis and mechanisms for real-time alerting. The objective is the efficient use of computational resources in the infrastructure while guaranteeing an agile delivery of monitoring data where and when it is needed.Tem-se vindo a verificar um aumento significativo de dispositivos móveis e do domínio IoT(Internet of Things) em áreas emergentes como Smart Cities, Smart Cars, etc., que fazem pedidos a serviços localizados normalmente na Cloud, muitas vezes a partir de locais remotos. Como consequência, prevê-se um aumento da latência no processamento destes pedidos, que poderá ser agravado pelo congestionamento dos canais de comunicação, da periferia até aos centros de dados. Uma forma de solucionar este problema passa pela criação de sistemas híbridos Cloud/Edge, compostos por nós computacionais que estão localizados na periferia do sistema, perto dos produtores e consumidores de dados, complementando assim os recursos computacionais da Cloud. Os nós da Edge permitem não só alojar dados e computações, garantindo uma resposta mais rápida aos clientes e uma melhor qualidade do serviço, como também permitem filtrar alguns dos dados, evitando deste modo transferências de dados desnecessárias para o núcleo do sistema. Contudo, muitos destes nós (e.g. pontos de acesso, servidores proprietários) têm uma capacidade limitada, são bastante heterogéneos e/ou encontram-se espalhados geograficamente, o que dificulta a gestão dos recursos. O paradigma de micro-serviços, representado por aplicações compostas por serviços de reduzida dimensão, desacoplados na sua funcionalidade e que comunicam por mensagens, fornece uma solução adequada para explorar os recursos computacionais na periferia. No entanto, o mapeamento adequado dos micro-serviços na infra-estrutura, além de ser complexo, é difícil de gerir e requer um sistema de monitorização ligeiro e ágil, que considere as capacidades limitadas da infra-estrutura de suporte na periferia. A monitorização não é um processo simples pois deve possibilitar a elasticidade do sistema, tendo em conta as adaptações de "deployment", e sem sobrecarregar os recursos computacionais ou de rede. Este trabalho apresenta uma arquitectura de monitorização híbrida, com componentes de maior complexidade na Cloud e componentes mais simples na Edge. A arquitectura fornece funcionalidades importantes de monitorização, como a recolha de métricas variadas, a sua análise e alertas em tempo real. O objetivo é rentabilizar os recursos computacionais garantindo a entrega dos dados mais relevantes quando necessário

    The 6G Architecture Landscape:European Perspective

    Get PDF
    corecore