830 research outputs found

    Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors

    Full text link
    Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor. We first speculatively execute segmented sum operations on the GPU part of a heterogeneous processor and generate a possibly incorrect results. Then the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over the best existing CSR-based SpMV algorithms. The source code of this work is downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSRComment: 22 pages, 8 figures, Published at Parallel Computing (PARCO

    Efficient Proactive Caching for Supporting Seamless Mobility

    Full text link
    We present a distributed proactive caching approach that exploits user mobility information to decide where to proactively cache data to support seamless mobility, while efficiently utilizing cache storage using a congestion pricing scheme. The proposed approach is applicable to the case where objects have different sizes and to a two-level cache hierarchy, for both of which the proactive caching problem is hard. Additionally, our modeling framework considers the case where the delay is independent of the requested data object size and the case where the delay is a function of the object size. Our evaluation results show how various system parameters influence the delay gains of the proposed approach, which achieves robust and good performance relative to an oracle and an optimal scheme for a flat cache structure.Comment: 10 pages, 9 figure

    Timing Predictability in Future Multi-Core Avionics Systems

    Full text link

    Compilation Optimizations to Enhance Resilience of Big Data Programs and Quantum Processors

    Get PDF
    Modern computers can experience a variety of transient errors due to the surrounding environment, known as soft faults. Although the frequency of these faults is low enough to not be noticeable on personal computers, they become a considerable concern during large-scale distributed computations or systems in more vulnerable environments like satellites. These faults occur as a bit flip of some value in a register, operation, or memory during execution. They surface as either program crashes, hangs, or silent data corruption (SDC), each of which can waste time, money, and resources. Hardware methods, such as shielding or error correcting memory (ECM), exist, though they can be difficult to implement, expensive, and may be limited to only protecting against errors in specific locations. Researchers have been exploring software detection and correction methods as an alternative, commonly trading either overhead in execution time or memory usage to protect against faults. Quantum computers, a relatively recent advancement in computing technology, experience similar errors on a much more severe scale. The errors are more frequent, costly, and difficult to detect and correct. Error correction algorithms like Shor’s code promise to completely remove errors, but they cannot be implemented on current noisy intermediate-scale quantum (NISQ) systems due to the low number of available qubits. Until the physical systems become large enough to support error correction, researchers instead have been studying other methods to reduce and compensate for errors. In this work, we present two methods for improving the resilience of classical processes, both single- and multi-threaded. We then introduce quantum computing and compare the nature of errors and correction methods to previous classical methods. We further discuss two designs for improving compilation of quantum circuits. One method, focused on quantum neural networks (QNNs), takes advantage of partial compilation to avoid recompiling the entire circuit each time. The other method is a new approach to compiling quantum circuits using graph neural networks (GNNs) to improve the resilience of quantum circuits and increase fidelity. By using GNNs with reinforcement learning, we can train a compiler to provide improved qubit allocation that improves the success rate of quantum circuits

    System-Scenario Methodology to Design a Highly Reliable Radiation-Hardened Memory for Space Applications

    Get PDF
    Cache memory circuits are one of the concerns of computing systems, especially in terms of power consumption, reliability, and high performance. Voltage-scaling techniques can be used to reduce the total power consumption of the caches. However, aggressive voltage scaling significantly increases the probability of memory failure, especially in environments with high radiation levels, such as space. It is, therefore, important to deploy techniques to deal with reliability issues along with voltage scaling. In this chapter, we present a system-scenario methodology for radiation-hardened memory design to keep the reliability during voltage scaling. Although any SRAM array can benefit from the design, we frame our study on the recently proposed radiation-hardened cell, Nwise, which provides high level of tolerance against single event and multi event upsets in memories. To reduce the power consumption while upholding reliability, we leverage the system-scenario-based design methodology to optimize the energy consumption in applications, where system requirements vary dynamically at run time. We demonstrate the use of the methodology with a use case related to satellite systems and solar activity. Our simulations show that we achieve up to 49.3% power consumption saving compared to using a cache design with a fixed nominal power supply level

    Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems

    Get PDF
    This dissertation addresses the complexities of 2D/3D multi-/many-core processor-memory systems, focusing on two key areas: enhancing timing predictability in real-time multi-core processors and optimizing performance within thermal constraints. The integration of an increasing number of transistors into compact chip designs, while boosting computational capacity, presents challenges in resource contention and thermal management. The first part of the thesis improves timing predictability. We enhance shared cache interference analysis for set-associative caches, advancing the calculation of Worst-Case Execution Time (WCET). This development enables accurate assessment of cache interference and the effectiveness of partitioned schedulers in real-world scenarios. We introduce TCPS, a novel task and cache-aware partitioned scheduler that optimizes cache partitioning based on task-specific WCET sensitivity, leading to improved schedulability and predictability. Our research explores various cache and scheduling configurations, providing insights into their performance trade-offs. The second part focuses on thermal management in 2D/3D many-core systems. Recognizing the limitations of Dynamic Voltage and Frequency Scaling (DVFS) in S-NUCA many-core processors, we propose synchronous thread migrations as a thermal management strategy. This approach culminates in the HotPotato scheduler, which balances performance and thermal safety. We also introduce 3D-TTP, a transient temperature-aware power budgeting strategy for 3D-stacked systems, reducing the need for Dynamic Thermal Management (DTM) activation. Finally, we present 3QUTM, a novel method for 3D-stacked systems that combines core DVFS and memory bank Low Power Modes with a learning algorithm, optimizing response times within thermal limits. This research contributes significantly to enhancing performance and thermal management in advanced processor-memory systems

    Data Storage and Dissemination in Pervasive Edge Computing Environments

    Get PDF
    Nowadays, smart mobile devices generate huge amounts of data in all sorts of gatherings. Much of that data has localized and ephemeral interest, but can be of great use if shared among co-located devices. However, mobile devices often experience poor connectivity, leading to availability issues if application storage and logic are fully delegated to a remote cloud infrastructure. In turn, the edge computing paradigm pushes computations and storage beyond the data center, closer to end-user devices where data is generated and consumed. Hence, enabling the execution of certain components of edge-enabled systems directly and cooperatively on edge devices. This thesis focuses on the design and evaluation of resilient and efficient data storage and dissemination solutions for pervasive edge computing environments, operating with or without access to the network infrastructure. In line with this dichotomy, our goal can be divided into two specific scenarios. The first one is related to the absence of network infrastructure and the provision of a transient data storage and dissemination system for networks of co-located mobile devices. The second one relates with the existence of network infrastructure access and the corresponding edge computing capabilities. First, the thesis presents time-aware reactive storage (TARS), a reactive data storage and dissemination model with intrinsic time-awareness, that exploits synergies between the storage substrate and the publish/subscribe paradigm, and allows queries within a specific time scope. Next, it describes in more detail: i) Thyme, a data storage and dis- semination system for wireless edge environments, implementing TARS; ii) Parsley, a flexible and resilient group-based distributed hash table with preemptive peer relocation and a dynamic data sharding mechanism; and iii) Thyme GardenBed, a framework for data storage and dissemination across multi-region edge networks, that makes use of both device-to-device and edge interactions. The developed solutions present low overheads, while providing adequate response times for interactive usage and low energy consumption, proving to be practical in a variety of situations. They also display good load balancing and fault tolerance properties.Resumo Hoje em dia, os dispositivos móveis inteligentes geram grandes quantidades de dados em todos os tipos de aglomerações de pessoas. Muitos desses dados têm interesse loca- lizado e efêmero, mas podem ser de grande utilidade se partilhados entre dispositivos co-localizados. No entanto, os dispositivos móveis muitas vezes experienciam fraca co- nectividade, levando a problemas de disponibilidade se o armazenamento e a lógica das aplicações forem totalmente delegados numa infraestrutura remota na nuvem. Por sua vez, o paradigma de computação na periferia da rede leva as computações e o armazena- mento para além dos centros de dados, para mais perto dos dispositivos dos utilizadores finais onde os dados são gerados e consumidos. Assim, permitindo a execução de certos componentes de sistemas direta e cooperativamente em dispositivos na periferia da rede. Esta tese foca-se no desenho e avaliação de soluções resilientes e eficientes para arma- zenamento e disseminação de dados em ambientes pervasivos de computação na periferia da rede, operando com ou sem acesso à infraestrutura de rede. Em linha com esta dico- tomia, o nosso objetivo pode ser dividido em dois cenários específicos. O primeiro está relacionado com a ausência de infraestrutura de rede e o fornecimento de um sistema efêmero de armazenamento e disseminação de dados para redes de dispositivos móveis co-localizados. O segundo diz respeito à existência de acesso à infraestrutura de rede e aos recursos de computação na periferia da rede correspondentes. Primeiramente, a tese apresenta armazenamento reativo ciente do tempo (ARCT), um modelo reativo de armazenamento e disseminação de dados com percepção intrínseca do tempo, que explora sinergias entre o substrato de armazenamento e o paradigma pu- blicação/subscrição, e permite consultas num escopo de tempo específico. De seguida, descreve em mais detalhe: i) Thyme, um sistema de armazenamento e disseminação de dados para ambientes sem fios na periferia da rede, que implementa ARCT; ii) Pars- ley, uma tabela de dispersão distribuída flexível e resiliente baseada em grupos, com realocação preventiva de nós e um mecanismo de particionamento dinâmico de dados; e iii) Thyme GardenBed, um sistema para armazenamento e disseminação de dados em redes multi-regionais na periferia da rede, que faz uso de interações entre dispositivos e com a periferia da rede. As soluções desenvolvidas apresentam baixos custos, proporcionando tempos de res- posta adequados para uso interativo e baixo consumo de energia, demonstrando serem práticas nas mais diversas situações. Estas soluções também exibem boas propriedades de balanceamento de carga e tolerância a faltas

    Building Internet caching systems for streaming media delivery

    Get PDF
    The proxy has been widely and successfully used to cache the static Web objects fetched by a client so that the subsequent clients requesting the same Web objects can be served directly from the proxy instead of other sources faraway, thus reducing the server\u27s load, the network traffic and the client response time. However, with the dramatic increase of streaming media objects emerging on the Internet, the existing proxy cannot efficiently deliver them due to their large sizes and client real time requirements.;In this dissertation, we design, implement, and evaluate cost-effective and high performance proxy-based Internet caching systems for streaming media delivery. Addressing the conflicting performance objectives for streaming media delivery, we first propose an efficient segment-based streaming media proxy system model. This model has guided us to design a practical streaming proxy, called Hyper-Proxy, aiming at delivering the streaming media data to clients with minimum playback jitter and a small startup latency, while achieving high caching performance. Second, we have implemented Hyper-Proxy by leveraging the existing Internet infrastructure. Hyper-Proxy enables the streaming service on the common Web servers. The evaluation of Hyper-Proxy on the global Internet environment and the local network environment shows it can provide satisfying streaming performance to clients while maintaining a good cache performance. Finally, to further improve the streaming delivery efficiency, we propose a group of the Shared Running Buffers (SRB) based proxy caching techniques to effectively utilize proxy\u27s memory. SRB algorithms can significantly reduce the media server/proxy\u27s load and network traffic and relieve the bottlenecks of the disk bandwidth and the network bandwidth.;The contributions of this dissertation are threefold: (1) we have studied several critical performance trade-offs and provided insights into Internet media content caching and delivery. Our understanding further leads us to establish an effective streaming system optimization model; (2) we have designed and evaluated several efficient algorithms to support Internet streaming content delivery, including segment caching, segment prefetching, and memory locality exploitation for streaming; (3) having addressed several system challenges, we have successfully implemented a real streaming proxy system and deployed it in a large industrial enterprise
    corecore