202 research outputs found

    Single-Board-Computer Clusters for Cloudlet Computing in Internet of Things

    Get PDF
    The number of connected sensors and devices is expected to increase to billions in the near future. However, centralised cloud-computing data centres present various challenges to meet the requirements inherent to Internet of Things (IoT) workloads, such as low latency, high throughput and bandwidth constraints. Edge computing is becoming the standard computing paradigm for latency-sensitive real-time IoT workloads, since it addresses the aforementioned limitations related to centralised cloud-computing models. Such a paradigm relies on bringing computation close to the source of data, which presents serious operational challenges for large-scale cloud-computing providers. In this work, we present an architecture composed of low-cost Single-Board-Computer clusters near to data sources, and centralised cloud-computing data centres. The proposed cost-efficient model may be employed as an alternative to fog computing to meet real-time IoT workload requirements while keeping scalability. We include an extensive empirical analysis to assess the suitability of single-board-computer clusters as cost-effective edge-computing micro data centres. Additionally, we compare the proposed architecture with traditional cloudlet and cloud architectures, and evaluate them through extensive simulation. We finally show that acquisition costs can be drastically reduced while keeping performance levels in data-intensive IoT use cases.Ministerio de Economía y Competitividad TIN2017-82113-C2-1-RMinisterio de Economía y Competitividad RTI2018-098062-A-I00European Union’s Horizon 2020 No. 754489Science Foundation Ireland grant 13/RC/209

    Implications and Limitations of Securing an InfiniBand Network

    Get PDF
    The InfiniBand Architecture is one of the leading network interconnects used in high performance computing, delivering very high bandwidth and low latency. As the popularity of InfiniBand increases, the possibility for new InfiniBand applications arise outside the domain of high performance computing, thereby creating the opportunity for new security risks. In this work, new security questions are considered and addressed. The study demonstrates that many common traffic analyzing tools cannot monitor or capture InfiniBand traffic transmitted between two hosts. Due to the kernel bypass nature of InfiniBand, many host-based network security systems cannot be executed on InfiniBand applications. Those that can impose a significant performance loss for the network. The research concludes that not all network security practices used for Ethernet translate to InfiniBand as previously suggested and that an answer to meeting specific security requirements for an InfiniBand network might reside in hardware offload

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

    Modern computing: Vision and challenges

    Get PDF
    Over the past six decades, the computing systems field has experienced significant transformations, profoundly impacting society with transformational developments, such as the Internet and the commodification of computing. Underpinned by technological advancements, computer systems, far from being static, have been continuously evolving and adapting to cover multifaceted societal niches. This has led to new paradigms such as cloud, fog, edge computing, and the Internet of Things (IoT), which offer fresh economic and creative opportunities. Nevertheless, this rapid change poses complex research challenges, especially in maximizing potential and enhancing functionality. As such, to maintain an economical level of performance that meets ever-tighter requirements, one must understand the drivers of new model emergence and expansion, and how contemporary challenges differ from past ones. To that end, this article investigates and assesses the factors influencing the evolution of computing systems, covering established systems and architectures as well as newer developments, such as serverless computing, quantum computing, and on-device AI on edge devices. Trends emerge when one traces technological trajectory, which includes the rapid obsolescence of frameworks due to business and technical constraints, a move towards specialized systems and models, and varying approaches to centralized and decentralized control. This comprehensive review of modern computing systems looks ahead to the future of research in the field, highlighting key challenges and emerging trends, and underscoring their importance in cost-effectively driving technological progress

    Design, development and evaluation of the ruggedized edge computing node (RECON)

    Get PDF
    The increased quality and quantity of sensors provide an ever-increasing capability to collect large quantities of high-quality data in the field. Research devoted to translating that data is progressing rapidly; however, translating field data into usable information can require high performance computing capabilities. While high performance computing (HPC) resources are available in centralized facilities, bandwidth, latency, security and other limitations inherent to edge location in field sensor applications may prevent HPC resources from being used in a timely fashion necessary for potential United States Army Corps of Engineers (USACE) field applications. To address these limitations, the design requirements for RECON are established and derived from a review of edge computing, in order to develop and evaluate a novel high-power, field-deployable HPC platform capable of operating in austere environments at the edge

    Methodology for malleable applications on distributed memory systems

    Get PDF
    A la portada logo BSC(English) The dominant programming approach for scientific and industrial computing on clusters is MPI+X. While there are a variety of approaches within the node, denoted by the ``X'', Message Passing interface (MPI) is the standard for programming multiple nodes with distributed memory. This thesis argues that the OmpSs-2 tasking model can be extended beyond the node to naturally support distributed memory, with three benefits: First, at small to medium scale the tasking model is a simpler and more productive alternative to MPI. It eliminates the need to distribute the data explicitly and convert all dependencies into explicit message passing. It also avoids the complexity of hybrid programming using MPI+X. Second, the ability to offload parts of the computation among the nodes enables the runtime to automatically balance the loads in a full-scale MPI+X program. This approach does not require a cost model, and it is able to transparently balance the computational loads across the whole program, on all its nodes. Third, because the runtime handles all low-level aspects of data distribution and communication, it can change the resource allocation dynamically, in a way that is transparent to the application. This thesis describes the design, development and evaluation of OmpSs-2@Cluster, a programming model and runtime system that extends the OmpSs-2 model to allow a virtually unmodified OmpSs-2 program to run across multiple distributed memory nodes. For well-balanced applications it provides similar performance to MPI+OpenMP on up to 16 nodes, and it improves performance by up to 2x for irregular and unbalanced applications like Cholesky factorization. This work also extended OmpSs-2@Cluster for interoperability with MPI and Barcelona Supercomputing Center (BSC)'s state-of-the-art Dynamic Load Balance (DLB) library in order to dynamically balance MPI+OmpSs-2 applications by transparently offloading tasks among nodes. This approach reduces the execution time of a microscale solid mechanics application by 46% on 64 nodes and on a synthetic benchmark, it is within 10% of perfect load balancing on up to 8 nodes. Finally, the runtime was extended to transparently support malleability for pure OmpSs-2@Cluster programs and interoperate with the Resources Management System (RMS). The only change to the application is to explicitly call an API function to control the addition or removal of nodes. In this regard we additionally provide the runtime with the ability to semi-transparently save and recover part of the application status to perform checkpoint and restart. Such a feature hides the complexity of data redistribution and parallel IO from the user while allowing the program to recover and continue previous executions. Our work is a starting point for future research on fault tolerance. In summary, OmpSs-2@Cluster expands the OmpSs-2 programming model to encompass distributed memory clusters. It allows an existing OmpSs-2 program, with few if any changes, to run across multiple nodes. OmpSs-2@Cluster supports transparent multi-node dynamic load balancing for MPI+OmpSs-2 programs, and enables semi-transparent malleability for OmpSs-2@Cluster programs. The runtime system has a high level of stability and performance, and it opens several avenues for future work.(Español) El modelo de programación dominante para clusters tanto en ciencia como industria es actualmente MPI+X. A pesar de que hay alguna variedad de alternativas para programar dentro de un nodo (indicado por la "X"), el estandar para programar múltiples nodos con memoria distribuida sigue siendo Message Passing Interface (MPI). Esta tesis propone la extensión del modelo de programación basado en tareas OmpSs-2 para su funcionamiento en sistemas de memoria distribuida, destacando 3 beneficios principales: En primer lugar; a pequeña y mediana escala, un modelo basado en tareas es más simple y productivo que MPI y elimina la necesidad de distribuir los datos explícitamente y convertir todas las dependencias en mensajes. Además, evita la complejidad de la programacion híbrida MPI+X. En segundo lugar; la capacidad de enviar partes del cálculo entre los nodos permite a la librería balancear la carga de trabajo en programas MPI+X a gran escala. Este enfoque no necesita un modelo de coste y permite equilibrar cargas transversalmente en todo el programa y todos los nodos. En tercer lugar; teniendo en cuenta que es la librería quien maneja todos los aspectos relacionados con distribución y transferencia de datos, es posible la modificación dinámica y transparente de los recursos que utiliza la aplicación. Esta tesis describe el diseño, desarrollo y evaluación de OmpSs-2@Cluster; un modelo de programación y librería que extiende OmpSs-2 permitiendo la ejecución de programas OmpSs-2 existentes en múltiples nodos sin prácticamente necesidad de modificarlos. Para aplicaciones balanceadas, este modelo proporciona un rendimiento similar a MPI+OpenMP hasta 16 nodos y duplica el rendimiento en aplicaciones irregulares o desbalanceadas como la factorización de Cholesky. Este trabajo incluye la extensión de OmpSs-2@Cluster para interactuar con MPI y la librería de balanceo de carga Dynamic Load Balancing (DLB) desarrollada en el Barcelona Supercomputing Center (BSC). De este modo es posible equilibrar aplicaciones MPI+OmpSs-2 mediante la transferencia transparente de tareas entre nodos. Este enfoque reduce el tiempo de ejecución de una aplicación de mecánica de sólidos a micro-escala en un 46% en 64 nodos; en algunos experimentos hasta 8 nodos se pudo equilibrar perfectamente la carga con una diferencia inferior al 10% del equilibrio perfecto. Finalmente, se implementó otra extensión de la librería para realizar operaciones de maleabilidad en programas OmpSs-2@Cluster e interactuar con el Sistema de Manejo de Recursos (RMS). El único cambio requerido en la aplicación es la llamada explicita a una función de la interfaz que controla la adición o eliminación de nodos. Además, se agregó la funcionalidad de guardar y recuperar parte del estado de la aplicación de forma semitransparente con el objetivo de realizar operaciones de salva-reinicio. Dicha funcionalidad oculta al usuario la complejidad de la redistribución de datos y las operaciones de lectura-escritura en paralelo, mientras permite al programa recuperar y continuar ejecuciones previas. Este es un punto de partida para futuras investigaciones en tolerancia a fallos. En resumen, OmpSs-2@Cluster amplía el modelo de programación de OmpSs-2 para abarcar sistemas de memoria distribuida. El modelo permite la ejecución de programas OmpSs-2 en múltiples nodos prácticamente sin necesidad de modificarlos. OmpSs-2@Cluster permite además el balanceo dinámico de carga en aplicaciones híbridas MPI+OmpSs-2 ejecutadas en varios nodos y es capaz de realizar maleabilidad semi-transparente en programas OmpSs-2@Cluster puros. La librería tiene un niveles de rendimiento y estabilidad altos y abre varios caminos para trabajos futuro.Arquitectura de computador

    The future roadmap of in-vehicle network processing: a HW-centric (R-)evolution

    Get PDF
    © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The automotive industry is undergoing a deep revolution. With the race towards autonomous driving, the amount of technologies, sensors and actuators that need to be integrated in the vehicle increases exponentially. This imposes new great challenges in the vehicle electric/electronic (E/E) architecture and, especially, in the In-Vehicle Network (IVN). In this work, we analyze the evolution of IVNs, and focus on the main network processing platform integrated in them: the Gateway (GW). We derive the requirements of Network Processing Platforms that need to be fulfilled by future GW controllers focusing on two perspectives: functional requirements and structural requirements. Functional requirements refer to the functionalities that need to be delivered by these network processing platforms. Structural requirements refer to design aspects which ensure the feasibility, usability and future evolution of the design. By focusing on the Network Processing architecture, we review the available options in the state of the art, both in industry and academia. We evaluate the strengths and weaknesses of each architecture in terms of the coverage provided for the functional and structural requirements. In our analysis, we detect a gap in this area: there is currently no architecture fulfilling all the requirements of future automotive GW controllers. In light of the available network processing architectures and the current technology landscape, we identify Hardware (HW) accelerators and custom processor design as a key differentiation factor which boosts the devices performance. From our perspective, this points to a need - and a research opportunity - to explore network processing architectures with a strong HW focus, unleashing the potential of next-generation network processors and supporting the demanding requirements of future autonomous and connected vehicles.Peer ReviewedPostprint (published version

    Dynamic Resource Scheduling in Cloud Data Center

    Get PDF
    Cloud infrastructure provides a wide range of resources and services to companies and organizations, such as computation, storage, database, platforms, etc. These resources and services are used to power up and scale out tenants' workloads and meet their specified service level agreements (SLA). With the various kinds and characteristics of its workloads, an important problem for cloud provider is how to allocate it resource among the requests. An efficient resource scheduling scheme should be able to benefit both the cloud provider and also the cloud users. For the cloud provider, the goal of the scheduling algorithm is to improve the throughput and the job completion rate of the cloud data center under the stress condition or to use less physical machines to support all incoming jobs under the overprovisioning condition. For the cloud users, the goal of scheduling algorithm is to guarantee the SLAs and satisfy other job specified requirements. Furthermore, since in a cloud data center, jobs would arrive and leave very frequently, hence, it is critical to make the scheduling decision within a reasonable time. To improve the efficiency of the cloud provider, the scheduling algorithm needs to jointly reduce the inter-VM and intra-VM fragments, which means to consider the scheduling problem with regard to both the cloud provider and the users. This thesis address the cloud scheduling problem from both the cloud provider and the user side. Cloud data centers typically require tenants to specify the resource demands for the virtual machines (VMs) they create using a set of pre-defined, fixed configurations, to ease the resource allocation problem. However, this approach could lead to low resource utilization of cloud data centers as tenants are obligated to conservatively predict the maximum resource demand of their applications. In addition to that, users are at an inferior position of estimating the VM demands without knowing the multiplexing techniques of the cloud provider. Cloud provider, on the other hand, has a better knowledge at selecting the VM sets for the submitted applications. The scheduling problem is even severe for the mobile user who wants to use the cloud infrastructure to extend his/her computation and battery capacity, where the response and scheduling time is tight and the transmission channel between mobile users and cloudlet is highly variable. This thesis investigates into the resource scheduling problem for both wired and mobile users in the cloud environment. The proposed resource allocation problem is studied in the methodology of problem modeling, trace analysis, algorithm design and simulation approach. The first aspect this thesis addresses is the VM scheduling problem. Instead of the static VM scheduling, this thesis proposes a finer-grained dynamic resource allocation and scheduling algorithm that can substantially improve the utilization of the data center resources by increasing the number of jobs accommodated and correspondingly, the cloud data center provider's revenue. The second problem this thesis addresses is joint VM set selection and scheduling problem. The basic idea is that there may exist multiple VM sets that can support an application's resource demand, and by elaborately select an appropriate VM set, the utilization of the data center can be improved without violating the application's SLA. The third problem addressed by the thesis is the mobile cloud resource scheduling problem, where the key issue is to find the most energy and time efficient way of allocating components of the target application given the current network condition and cloud resource usage status. The main contribution of this thesis are the followings. For the dynamic real-time scheduling problem, a constraint programming solution is proposed to schedule the long jobs, and simple heuristics are used to quickly, yet quite accurately schedule the short jobs. Trace-driven simulations shows that the overall revenue for the cloud provider can be improved by 30\% over the traditional static VM resource allocation based on the coarse granularity specifications. For the joint VM selection and scheduling problem, this thesis proposes an optimal online VM set selection scheme that satisfies the user resource demand and minimizes the number of activated physical machines. Trace driven simulation shows around 18\% improvement of the overall utility of the provider compared to Bazaar-I approach and more than 25\% compared to best-fit and first-fit. For the mobile cloud scheduling problem, a reservation-based joint code partition and resource scheduling algorithm is proposed by conservatively estimating the minimal resource demand and a polynomial time code partition algorithm is proposed to obtain the corresponding partition
    corecore