155 research outputs found
Scheduling Irregular Workloads on GPUs
This doctoral research aims at understanding the nature of the overhead for data irregular GPU workloads, proposing a solution, and examining the consequences of the result. We propose a novel, retry-free GPU workload scheduler for irregular workloads. When used in a Breadth First Search (BFS) algorithm, the proposed simple, monolithic concurrent queue scales to within 10% of ideal scalability on AMD’s Fiji GPU with 14,336 active threads. The dissertation presents an important finding that the retry overhead associated with Compare and Swap (CAS) operations is the principle reason why concurrent queues do not scale well as the number of clients increases in a massively multi-threaded environment
Traffic Profiles and Performance Modelling of Heterogeneous Networks
This thesis considers the analysis and study of short and long-term traffic patterns of
heterogeneous networks. A large number of traffic profiles from different locations and
network environments have been determined. The result of the analysis of these patterns
has led to a new parameter, namely the 'application signature'. It was found that these
signatures manifest themselves in various granularities over time, and are usually unique
to an application, permanent virtual circuit (PVC), user or service. The differentiation of
the application signatures into different categories creates a foundation for short and long-term
management of networks. The thesis therefore looks from the micro and macro
perspective on traffic management, covering both aspects.
The long-term traffic patterns have been used to develop a novel methodology for network
planning and design. As the size and complexity of interconnected systems grow steadily,
usually covering different time zones, geographical and political areas, a new
methodology has been developed as part of this thesis. A part of the methodology is a new
overbooking mechanism, which stands in contrast to existing overbooking methods
created by companies like Bell Labs. The new overbooking provides companies with
cheaper network design and higher average throughput. In addition, new requirements like
risk factors have been incorporated into the methodology, which lay historically outside
the design process. A large network service provider has implemented the overbooking
mechanism into their network planning process, enabling practical evaluation.
The other aspect of the thesis looks at short-term traffic patterns, to analyse how
congestion can be controlled. Reoccurring short-term traffic patterns, the application
signatures, have been used for this research to develop the "packet train model" further.
Through this research a new congestion control mechanism was created to investigate how
the application signatures and the "extended packet train model" could be used. To
validate the results, a software simulation has been written that executes the proprietary
congestion mechanism and the new mechanism for comparison. Application signatures for
the TCP/IP protocols have been applied in the simulation and the results are displayed and
discussed in the thesis. The findings show the effects that frame relay congestion control
mechanisms have on TCP/IP, where the re-sending of segments, buffer allocation, delay
and throughput are compared. The results prove that application signatures can be used
effectively to enhance existing congestion control mechanisms.AT&T (UK) Ltd, Englan
RA-LPEL: A Resource-Aware Light-Weight Parallel Execution Layer for Reactive Stream Processing Networks on The SCC Many-core Tiled Architecture
In computing the available computing power has continuously fallen short of the demanded computing performance. As a consequence, performance improvement has been the main focus of processor design. However, due to the phenomenon called “Power Wall” it has become infeasible to build faster processors by just increasing the
processor’s clock speed. One of the resulting trends in hardware design is to integrate several simple and power-efficient cores on the same chip. This design shift poses challenges of its own. In the past, with increasing clock frequency the programs became automatically faster as well without modifications. This is no longer true with many-core architectures. To achieve maximum performance the programs have to run concurrently on more than one core, which forces the general computing paradigm to
become increasingly parallel to leverage maximum processing power.
In this thesis, we will focus on the Reactive Stream Program (RSP). In stream processing, the system consists of computing nodes, which are connected via communication streams. These streams simplify the concurrency management on modern many-core architectures due to their implicit synchronisation. RSP is a stream processing system that implements the reactive system. The RSPs work in tandem with their environment and the load imposed by the environment may vary over time. This provides a unique opportunity to increase performance per watt. In this thesis the
research contribution focuses on the design of the execution layer to run RSPs on tiled many-core architectures, using the Intel’s Single-chip Cloud Computer (SCC) processor as a concrete experimentation platform. Further, we have developed a
Dynamic Voltage and Frequency Scaling (DVFS) technique for RSP deployed on many-core architectures. In contrast to many other approaches, our DVFS technique does not require the capability of controlling the power settings of individual computing elements, thus making it applicable for modern many-core architectures, with
which power can be changed only for power islands. The experimental results confirm that the proposed DVFS technique can effectively improve the energy efficiency, i.e. increase the performance per watt, for RSPs
An OS-Based Alternative to Full Hardware Coherence on Tiled Chip-Multiprocessors
Institute for Computing Systems ArchitectureThe interconnect mechanisms (shared bus or crossbar) used in current chip-multiprocessors
(CMPs) are expected to become a bottleneck that prevents these architectures from scaling to a
larger number of cores. Tiled CMPs offer better scalability by integrating relatively simple cores
with a lightweight point-to-point interconnect. However, such interconnects make snooping
impractical and, thus, require alternative solutions to cache coherence.
This thesis proposes a novel, cost-effective hardware mechanism to support shared-memory
parallel applications that forgoes hardware maintained cache coherence. The proposed mech-
anism is based on the key ideas that mapping of lines to physical caches is done at the page
level with OS support and that hardware supports remote cache accesses. It allows only some
controlled migration and replication of data and provides a sufficient degree of flexibility in the
mapping through an extra level of indirection between virtual pages and physical tiles.
The proposed tiled CMP architecture is evaluated on the SPLASH-2 scientific benchmarks
and ALPBench multimedia benchmarks against one with private caches and a distributed direc-
tory cache coherence mechanism. Experimental results show that the performance degradation
is as little as 0%, and 16% on average, compared to the cache coherent architecture across all
benchmarks for 16 and 32 processors
Demystifying Internet of Things Security
Break down the misconceptions of the Internet of Things by examining the different security building blocks available in Intel Architecture (IA) based IoT platforms. This open access book reviews the threat pyramid, secure boot, chain of trust, and the SW stack leading up to defense-in-depth. The IoT presents unique challenges in implementing security and Intel has both CPU and Isolated Security Engine capabilities to simplify it. This book explores the challenges to secure these devices to make them immune to different threats originating from within and outside the network. The requirements and robustness rules to protect the assets vary greatly and there is no single blanket solution approach to implement security. Demystifying Internet of Things Security provides clarity to industry professionals and provides and overview of different security solutions What You'll Learn Secure devices, immunizing them against different threats originating from inside and outside the network Gather an overview of the different security building blocks available in Intel Architecture (IA) based IoT platforms Understand the threat pyramid, secure boot, chain of trust, and the software stack leading up to defense-in-depth Who This Book Is For Strategists, developers, architects, and managers in the embedded and Internet of Things (IoT) space trying to understand and implement the security in the IoT devices/platforms
- …