Search CORE

266 research outputs found

Deriving Efficient Data Movement from Decoupled Access/Execute Specifications

Author: A. Lokhmotov
A. Solar-Lezama
B.R. Gaster
D.L. Lau
H.P. Hofstee
H.S. Warren
I. Watson
J.E. Smith
J.H. Saltz
R. Allen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

High Performance Computing using Infiniband-based clusters

Author: Hemmatpour Masoud
Publication venue: Politecnico di Torino
Publication date
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Rethinking Cache Hierarchy And Interconnect Design For Next-Generation Gpus

Author: Ibrahim Mohamed Assem Abd ElMohsen
Publication venue: W&M ScholarWorks
Publication date: 01/01/2021
Field of study

To match the increasing computational demands of GPGPU applications and to improve peak compute throughput, the core counts in GPUs have been increasing with every generation. However, the famous memory wall is a major performance determinant in GPUs. In other words, in most cases, peak throughput in GPUs is ultimately dictated by memory bandwidth. Therefore, to serve the memory demands of thousands of concurrently executing threads, GPUs are equipped with several sources of bandwidth such as on-chip private/shared caching resources and off-chip high bandwidth memories. However, the existing sources of bandwidth are often not sufficient for achieving optimal GPU performance. Therefore, it is important to conserve and improve memory bandwidth utilization. To achieve the aforementioned goal, this dissertation focuses on improving on-chip cache bandwidth by managing cache line (data) replication across L1 caches via rethinking the cache hierarchy and the interconnect design. Such data replication stems from the private nature of the L1 caches and inter-core locality. Specifically, each GPU core can independently request and store a given cache line (in its local L1 cache) while being oblivious to the previous requests of other cores. This dissertation treats inter-core locality (i.e., data replication) as a double-edged sword, and proposes the following. First, this dissertation shows that efficient inter-core communication can exploit data replication across the L1 caches to unlock an additional potential source of on-chip bandwidth, which we call as remote-core bandwidth. We propose to efficiently coordinate the data movement across GPU cores to exploit this remote-core bandwidth by investigating: a) which data is replicated across cores, b) which cores have the replicated data, and c) how to fetch the replicated data as soon as possible. Second, this dissertation shows that if data replication is eliminated (or reduced), then the L1 caches can effectively cache more data, leading to higher hit rates and more on-chip bandwidth. We propose designing a shared L1 cache organization, which restricts each core to cache only a unique slice of the address range, eliminating data replication. We develop lightweight mechanisms to: a) reduce the inter-core communication overheads and b) to identify applications that prefer the private L1 organization and hence execute them accordingly. Third, to improve the performance, area, and energy efficiency of the shared L1 organization, this dissertation proposes DC-L1 (DeCoupled-L1) cache, an L1 cache separated from the GPU core. We show how the decoupled nature of the DC-L1 caches provides an opportunity to aggregate the L1 caches, and enables low-overhead efficient data placement designs. These optimizations reduce data replication across the L1s and increase their bandwidth utilization. Altogether, this dissertation develops several innovative techniques to improve the efficiency of the GPU on-chip memory system, which are necessary to address the memory wall problem. The future work will explore other designs and techniques to improve on-chip bandwidth utilization by considering other bandwidth sources (e.g., scratchpad and L2 cache)

College of William & Mary: W&M Publish

Using Blockchain to support Data & Service Monetization

Author: Odiete Obaro O 1988-
Publication venue: 'University of Saskatchewan Library'
Publication date: 05/03/2018
Field of study

Two required features of a data monetization platform are query and retrieval of the metadata of the resources to be monetized. Centralized platforms rely on the maturity of traditional NoSQL database systems to support these features. These databases, for example, MongoDB allows for very efficient query and retrieval of data it stores. However, centralized platforms come with a bag of security and privacy concerns, making them not the ideal approach for a data monetization platform. On the other hand, most existing decentralized platforms are only partially decentralized. In this research, I developed Cowry, a platform for publishing metadata describing available resources (data or services), discovery of published metadata including fast search and filtering. My main contribution is a fully decentralized architecture that combines blockchain and traditional distributed database to gain additional features such as efficient query and retrieval of metadata stored on the blockchain

eCommons@USASK

University of Saskatchewan Research Archive