327 research outputs found

    20th SC@RUG 2023 proceedings 2022-2023

    Get PDF

    20th SC@RUG 2023 proceedings 2022-2023

    Get PDF

    Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics

    Full text link
    The diversity of workload requirements and increasing hardware heterogeneity in emerging high performance computing (HPC) systems motivate resource disaggregation. Resource disaggregation allows compute and memory resources to be allocated individually as required to each workload. However, it is unclear how to efficiently realize this capability and cost-effectively meet the stringent bandwidth and latency requirements of HPC applications. To that end, we describe how modern photonics can be co-designed with modern HPC racks to implement flexible intra-rack resource disaggregation and fully meet the bit error rate (BER) and high escape bandwidth of all chip types in modern HPC racks. Our photonic-based disaggregated rack provides an average application speedup of 11% (46% maximum) for 25 CPU and 61% for 24 GPU benchmarks compared to a similar system that instead uses modern electronic switches for disaggregation. Using observed resource usage from a production system, we estimate that an iso-performance intra-rack disaggregated HPC system using photonics would require 4x fewer memory modules and 2x fewer NICs than a non-disaggregated baseline.Comment: 15 pages, 12 figures, 4 tables. Published in IEEE Cluster 202

    Farming out : a study.

    Get PDF
    Farming is one of severals ways of arranging for a group of individuals to perform work simultaneously. Farming is attractive. It is a simple concept, and yet it allocates work dynamically, balancing the load automatically. This gives rise to potentially great efficiency; yet the range of applications that can be farmed efficiently and which implementation strategies are the most effective has not been classified. This research has investigated the types of application, design and implementation that farm efficiently on computer systems constructed from a network of communicating parallel processors. This research shows that all applications can be farmed and identifies those concerns that dictate efficiency. For the first generation of transputer hardware, extensive experiments have been performed using Occam, independent of any specific application. This study identified the boundary conditions that dictate which design parameters farm efficiently. These boundary conditions are expressed in a general form that is directly amenable to other architectures. The specific quantitative results are of direct use to others who wish to implement farms on this architecture. Because of farming’s simplicity and potential for high efficiency, this work concludes that architects of parallel hardware should consider binding this paradigm into future systems so as to enable the dynamic allocation of processes to processors to take place automatically. As well as resulting in high levels of machine utilisation for all programs, this would also permanently remove the burden of allocation from the programmer

    Architecture-based Evolution of Dependable Software-intensive Systems

    Get PDF
    This cumulative habilitation thesis, proposes concepts for (i) modelling and analysing dependability based on architectural models of software-intensive systems early in development, (ii) decomposition and composition of modelling languages and analysis techniques to enable more flexibility in evolution, and (iii) bridging the divergent levels of abstraction between data of the operation phase, architectural models and source code of the development phase

    Modelling, Dimensioning and Optimization of 5G Communication Networks, Resources and Services

    Get PDF
    This reprint aims to collect state-of-the-art research contributions that address challenges in the emerging 5G networks design, dimensioning and optimization. Designing, dimensioning and optimization of communication networks resources and services have been an inseparable part of telecom network development. The latter must convey a large volume of traffic, providing service to traffic streams with highly differentiated requirements in terms of bit-rate and service time, required quality of service and quality of experience parameters. Such a communication infrastructure presents many important challenges, such as the study of necessary multi-layer cooperation, new protocols, performance evaluation of different network parts, low layer network design, network management and security issues, and new technologies in general, which will be discussed in this book

    LightRW: FPGA Accelerated Graph Dynamic Random Walks

    Full text link
    Graph dynamic random walks (GDRWs) have recently emerged as a powerful paradigm for graph analytics and learning applications, including graph embedding and graph neural networks. Despite the fact that many existing studies optimize the performance of GDRWs on multi-core CPUs, massive random memory accesses and costly synchronizations cause severe resource underutilization, and the processing of GDRWs is usually the key performance bottleneck in many graph applications. This paper studies an alternative architecture, FPGA, to address these issues in GDRWs, as FPGA has the ability of hardware customization so that we are able to explore fine-grained pipeline execution and specialized memory access optimizations. Specifically, we propose {LightRW}, a novel FPGA-based accelerator for GDRWs. LightRW embraces a series of optimizations to enable fine-grained pipeline execution on the chip and to exploit the massive parallelism of FPGA while significantly reducing memory accesses. As current commonly used sampling methods in GDRWs do not efficiently support fine-grained pipeline execution, we develop a parallelized reservoir sampling method to sample multiple vertices per cycle for efficient pipeline execution. To address the random memory access issues, we propose a degree-aware configurable caching method that buffers hot vertices on-chip to alleviate random memory accesses and a dynamic burst access engine that efficiently retrieves neighbors. Experimental results show that our optimization techniques are able to improve the performance of GDRWs on FPGA significantly. Moreover, LightRW delivers up to 9.55x and 9.10x speedup over the state-of-the-art CPU-based MetaPath and Node2vec random walks, respectively. This work is open-sourced on GitHub at https://github.com/Xtra-Computing/LightRW.Comment: Accepted to SIGMOD 202

    20th SC@RUG 2023 proceedings 2022-2023

    Get PDF

    WiFi Sensing at the Edge Towards Scalable On-Device Wireless Sensing Systems

    Get PDF
    WiFi sensing offers a powerful method for tracking physical activities using the radio-frequency signals already found throughout our homes and offices. This novel sensing modality offers continuous and non-intrusive activity tracking since sensing can be performed (i) without requiring wearable sensors, (ii) outside the line-of-sight, and even (iii) through the wall. Furthermore, WiFi has become a ubiquitous technology in our computers, our smartphones, and even in low-cost Internet of Things devices. In this work, we consider how the ubiquity of these low-cost WiFi devices offer an unparalleled opportunity for improving the scalability of wireless sensing systems. Thus far, WiFi sensing research assumes costly offline computing resources and hardware for training machine learning models and for performing model inference. To improve the scalability of WiFi sensing systems, this dissertation introduces techniques for improving machine learning at the edge by thoroughly surveying and evaluating signal preprocessing and edge machine learning techniques. Additionally, we introduce the use of federated learning for collaboratively training machine learning models with WiFi data only available on edge devices. We then consider privacy and security concerns of WiFi sensing by demonstrating possible adversarial surveillance attacks. To combat these attacks, we propose a method for leveraging spatially distributed antennas to prevent eavesdroppers from performing adversarial surveillance while still enabling and even improving the sensing capabilities of allowed WiFi sensing devices within our environments. The overall goal throughout this work is to demonstrate that WiFi sensing can become a ubiquitous and secure sensing option through the use of on-device computation on low-cost edge devices
    corecore