153 research outputs found

    Scheduling Irregular Workloads on GPUs

    Get PDF
    This doctoral research aims at understanding the nature of the overhead for data irregular GPU workloads, proposing a solution, and examining the consequences of the result. We propose a novel, retry-free GPU workload scheduler for irregular workloads. When used in a Breadth First Search (BFS) algorithm, the proposed simple, monolithic concurrent queue scales to within 10% of ideal scalability on AMD’s Fiji GPU with 14,336 active threads. The dissertation presents an important finding that the retry overhead associated with Compare and Swap (CAS) operations is the principle reason why concurrent queues do not scale well as the number of clients increases in a massively multi-threaded environment

    Traffic Profiles and Performance Modelling of Heterogeneous Networks

    Get PDF
    This thesis considers the analysis and study of short and long-term traffic patterns of heterogeneous networks. A large number of traffic profiles from different locations and network environments have been determined. The result of the analysis of these patterns has led to a new parameter, namely the 'application signature'. It was found that these signatures manifest themselves in various granularities over time, and are usually unique to an application, permanent virtual circuit (PVC), user or service. The differentiation of the application signatures into different categories creates a foundation for short and long-term management of networks. The thesis therefore looks from the micro and macro perspective on traffic management, covering both aspects. The long-term traffic patterns have been used to develop a novel methodology for network planning and design. As the size and complexity of interconnected systems grow steadily, usually covering different time zones, geographical and political areas, a new methodology has been developed as part of this thesis. A part of the methodology is a new overbooking mechanism, which stands in contrast to existing overbooking methods created by companies like Bell Labs. The new overbooking provides companies with cheaper network design and higher average throughput. In addition, new requirements like risk factors have been incorporated into the methodology, which lay historically outside the design process. A large network service provider has implemented the overbooking mechanism into their network planning process, enabling practical evaluation. The other aspect of the thesis looks at short-term traffic patterns, to analyse how congestion can be controlled. Reoccurring short-term traffic patterns, the application signatures, have been used for this research to develop the "packet train model" further. Through this research a new congestion control mechanism was created to investigate how the application signatures and the "extended packet train model" could be used. To validate the results, a software simulation has been written that executes the proprietary congestion mechanism and the new mechanism for comparison. Application signatures for the TCP/IP protocols have been applied in the simulation and the results are displayed and discussed in the thesis. The findings show the effects that frame relay congestion control mechanisms have on TCP/IP, where the re-sending of segments, buffer allocation, delay and throughput are compared. The results prove that application signatures can be used effectively to enhance existing congestion control mechanisms.AT&T (UK) Ltd, Englan

    Distributed-Memory Load Balancing With Cyclic Token-Based Work-Stealing Applied to Reverse Time Migration

    No full text

    RA-LPEL: A Resource-Aware Light-Weight Parallel Execution Layer for Reactive Stream Processing Networks on The SCC Many-core Tiled Architecture

    Get PDF
    In computing the available computing power has continuously fallen short of the demanded computing performance. As a consequence, performance improvement has been the main focus of processor design. However, due to the phenomenon called “Power Wall” it has become infeasible to build faster processors by just increasing the processor’s clock speed. One of the resulting trends in hardware design is to integrate several simple and power-efficient cores on the same chip. This design shift poses challenges of its own. In the past, with increasing clock frequency the programs became automatically faster as well without modifications. This is no longer true with many-core architectures. To achieve maximum performance the programs have to run concurrently on more than one core, which forces the general computing paradigm to become increasingly parallel to leverage maximum processing power. In this thesis, we will focus on the Reactive Stream Program (RSP). In stream processing, the system consists of computing nodes, which are connected via communication streams. These streams simplify the concurrency management on modern many-core architectures due to their implicit synchronisation. RSP is a stream processing system that implements the reactive system. The RSPs work in tandem with their environment and the load imposed by the environment may vary over time. This provides a unique opportunity to increase performance per watt. In this thesis the research contribution focuses on the design of the execution layer to run RSPs on tiled many-core architectures, using the Intel’s Single-chip Cloud Computer (SCC) processor as a concrete experimentation platform. Further, we have developed a Dynamic Voltage and Frequency Scaling (DVFS) technique for RSP deployed on many-core architectures. In contrast to many other approaches, our DVFS technique does not require the capability of controlling the power settings of individual computing elements, thus making it applicable for modern many-core architectures, with which power can be changed only for power islands. The experimental results confirm that the proposed DVFS technique can effectively improve the energy efficiency, i.e. increase the performance per watt, for RSPs

    An OS-Based Alternative to Full Hardware Coherence on Tiled Chip-Multiprocessors

    Get PDF
    Institute for Computing Systems ArchitectureThe interconnect mechanisms (shared bus or crossbar) used in current chip-multiprocessors (CMPs) are expected to become a bottleneck that prevents these architectures from scaling to a larger number of cores. Tiled CMPs offer better scalability by integrating relatively simple cores with a lightweight point-to-point interconnect. However, such interconnects make snooping impractical and, thus, require alternative solutions to cache coherence. This thesis proposes a novel, cost-effective hardware mechanism to support shared-memory parallel applications that forgoes hardware maintained cache coherence. The proposed mech- anism is based on the key ideas that mapping of lines to physical caches is done at the page level with OS support and that hardware supports remote cache accesses. It allows only some controlled migration and replication of data and provides a sufficient degree of flexibility in the mapping through an extra level of indirection between virtual pages and physical tiles. The proposed tiled CMP architecture is evaluated on the SPLASH-2 scientific benchmarks and ALPBench multimedia benchmarks against one with private caches and a distributed direc- tory cache coherence mechanism. Experimental results show that the performance degradation is as little as 0%, and 16% on average, compared to the cache coherent architecture across all benchmarks for 16 and 32 processors

    Demystifying Internet of Things Security

    Get PDF
    Break down the misconceptions of the Internet of Things by examining the different security building blocks available in Intel Architecture (IA) based IoT platforms. This open access book reviews the threat pyramid, secure boot, chain of trust, and the SW stack leading up to defense-in-depth. The IoT presents unique challenges in implementing security and Intel has both CPU and Isolated Security Engine capabilities to simplify it. This book explores the challenges to secure these devices to make them immune to different threats originating from within and outside the network. The requirements and robustness rules to protect the assets vary greatly and there is no single blanket solution approach to implement security. Demystifying Internet of Things Security provides clarity to industry professionals and provides and overview of different security solutions What You'll Learn Secure devices, immunizing them against different threats originating from inside and outside the network Gather an overview of the different security building blocks available in Intel Architecture (IA) based IoT platforms Understand the threat pyramid, secure boot, chain of trust, and the software stack leading up to defense-in-depth Who This Book Is For Strategists, developers, architects, and managers in the embedded and Internet of Things (IoT) space trying to understand and implement the security in the IoT devices/platforms

    Automatic and Explicit Parallelization Approaches for Mathematical Simulation Models

    Full text link
    corecore