1,293 research outputs found

    Modern computing: Vision and challenges

    Get PDF
    Over the past six decades, the computing systems field has experienced significant transformations, profoundly impacting society with transformational developments, such as the Internet and the commodification of computing. Underpinned by technological advancements, computer systems, far from being static, have been continuously evolving and adapting to cover multifaceted societal niches. This has led to new paradigms such as cloud, fog, edge computing, and the Internet of Things (IoT), which offer fresh economic and creative opportunities. Nevertheless, this rapid change poses complex research challenges, especially in maximizing potential and enhancing functionality. As such, to maintain an economical level of performance that meets ever-tighter requirements, one must understand the drivers of new model emergence and expansion, and how contemporary challenges differ from past ones. To that end, this article investigates and assesses the factors influencing the evolution of computing systems, covering established systems and architectures as well as newer developments, such as serverless computing, quantum computing, and on-device AI on edge devices. Trends emerge when one traces technological trajectory, which includes the rapid obsolescence of frameworks due to business and technical constraints, a move towards specialized systems and models, and varying approaches to centralized and decentralized control. This comprehensive review of modern computing systems looks ahead to the future of research in the field, highlighting key challenges and emerging trends, and underscoring their importance in cost-effectively driving technological progress

    20th SC@RUG 2023 proceedings 2022-2023

    Get PDF

    ACiS: smart switches with application-level acceleration

    Full text link
    Network performance has contributed fundamentally to the growth of supercomputing over the past decades. In parallel, High Performance Computing (HPC) peak performance has depended, first, on ever faster/denser CPUs, and then, just on increasing density alone. As operating frequency, and now feature size, have levelled off, two new approaches are becoming central to achieving higher net performance: configurability and integration. Configurability enables hardware to map to the application, as well as vice versa. Integration enables system components that have generally been single function-e.g., a network to transport data—to have additional functionality, e.g., also to operate on that data. More generally, integration enables compute-everywhere: not just in CPU and accelerator, but also in network and, more specifically, the communication switches. In this thesis, we propose four novel methods of enhancing HPC performance through Advanced Computing in the Switch (ACiS). More specifically, we propose various flexible and application-aware accelerators that can be embedded into or attached to existing communication switches to improve the performance and scalability of HPC and Machine Learning (ML) applications. We follow a modular design discipline through introducing composable plugins to successively add ACiS capabilities. In the first work, we propose an inline accelerator to communication switches for user-definable collective operations. MPI collective operations can often be performance killers in HPC applications; we seek to solve this bottleneck by offloading them to reconfigurable hardware within the switch itself. We also introduce a novel mechanism that enables the hardware to support MPI communicators of arbitrary shape and that is scalable to very large systems. In the second work, we propose a look-aside accelerator for communication switches that is capable of processing packets at line-rate. Functions requiring loops and states are addressed in this method. The proposed in-switch accelerator is based on a RISC-V compatible Coarse Grained Reconfigurable Arrays (CGRAs). To facilitate usability, we have developed a framework to compile user-provided C/C++ codes to appropriate back-end instructions for configuring the accelerator. In the third work, we extend ACiS to support fused collectives and the combining of collectives with map operations. We observe that there is an opportunity of fusing communication (collectives) with computation. Since the computation can vary for different applications, ACiS support should be programmable in this method. In the fourth work, we propose that switches with ACiS support can control and manage the execution of applications, i.e., that the switch be an active device with decision-making capabilities. Switches have a central view of the network; they can collect telemetry information and monitor application behavior and then use this information for control, decision-making, and coordination of nodes. We evaluate the feasibility of ACiS through extensive RTL-based simulation as well as deployment in an open-access cloud infrastructure. Using this simulation framework, when considering a Graph Convolutional Network (GCN) application as a case study, a speedup of on average 3.4x across five real-world datasets is achieved on 24 nodes compared to a CPU cluster without ACiS capabilities

    Optimisation for Optical Data Centre Switching and Networking with Artificial Intelligence

    Get PDF
    Cloud and cluster computing platforms have become standard across almost every domain of business, and their scale quickly approaches O(106)\mathbf{O}(10^6) servers in a single warehouse. However, the tier-based opto-electronically packet switched network infrastructure that is standard across these systems gives way to several scalability bottlenecks including resource fragmentation and high energy requirements. Experimental results show that optical circuit switched networks pose a promising alternative that could avoid these. However, optimality challenges are encountered at realistic commercial scales. Where exhaustive optimisation techniques are not applicable for problems at the scale of Cloud-scale computer networks, and expert-designed heuristics are performance-limited and typically biased in their design, artificial intelligence can discover more scalable and better performing optimisation strategies. This thesis demonstrates these benefits through experimental and theoretical work spanning all of component, system and commercial optimisation problems which stand in the way of practical Cloud-scale computer network systems. Firstly, optical components are optimised to gate in ≈500ps\approx 500 ps and are demonstrated in a proof-of-concept switching architecture for optical data centres with better wavelength and component scalability than previous demonstrations. Secondly, network-aware resource allocation schemes for optically composable data centres are learnt end-to-end with deep reinforcement learning and graph neural networks, where 3×3\times less networking resources are required to achieve the same resource efficiency compared to conventional methods. Finally, a deep reinforcement learning based method for optimising PID-control parameters is presented which generates tailored parameters for unseen devices in O(10−3)s\mathbf{O}(10^{-3}) s. This method is demonstrated on a market leading optical switching product based on piezoelectric actuation, where switching speed is improved >20%>20\% with no compromise to optical loss and the manufacturing yield of actuators is improved. This method was licensed to and integrated within the manufacturing pipeline of this company. As such, crucial public and private infrastructure utilising these products will benefit from this work

    Reshaping Higher Education for a Post-COVID-19 World: Lessons Learned and Moving Forward

    Get PDF
    No abstract available

    20th SC@RUG 2023 proceedings 2022-2023

    Get PDF

    The Role of a Microservice Architecture on cybersecurity and operational resilience in critical systems

    Get PDF
    Critical systems are characterized by their high degree of intolerance to threats, in other words, their high level of resilience, because depending on the context in which the system is inserted, the slightest failure could imply significant damage, whether in economic terms, or loss of reputation, of information, of infrastructure, of the environment, or human life. The security of such systems is traditionally associated with legacy infrastructures and data centers that are monolithic, which translates into increasingly high evolution and protection challenges. In the current context of rapid transformation where the variety of threats to systems has been consistently increasing, this dissertation aims to carry out a compatibility study of the microservice architecture, which is denoted by its characteristics such as resilience, scalability, modifiability and technological heterogeneity, being flexible in structural adaptations, and in rapidly evolving and highly complex settings, making it suited for agile environments. It also explores what response artificial intelligence, more specifically machine learning, can provide in a context of security and monitorability when combined with a simple banking system that adopts the microservice architecture.Os sistemas críticos são caracterizados pelo seu elevado grau de intolerância às ameaças, por outras palavras, o seu alto nível de resiliência, pois dependendo do contexto onde se insere o sistema, a mínima falha poderá implicar danos significativos, seja em termos económicos, de perda de reputação, de informação, de infraestrutura, de ambiente, ou de vida humana. A segurança informática de tais sistemas está tradicionalmente associada a infraestruturas e data centers legacy, ou seja, de natureza monolítica, o que se traduz em desafios de evolução e proteção cada vez mais elevados. No contexto atual de rápida transformação, onde as variedades de ameaças aos sistemas têm vindo consistentemente a aumentar, esta dissertação visa realizar um estudo de compatibilidade da arquitetura de microserviços, que se denota pelas suas caraterísticas tais como a resiliência, escalabilidade, modificabilidade e heterogeneidade tecnológica, sendo flexível em adaptações estruturais, e em cenários de rápida evolução e elevada complexidade, tornando-a adequada a ambientes ágeis. Explora também a resposta que a inteligência artificial, mais concretamente, machine learning, pode dar num contexto de segurança e monitorabilidade quando combinado com um simples sistema bancário que adota uma arquitetura de microserviços
    • …
    corecore