Search CORE

4 research outputs found

Mascar: Speeding up GPU Warps by Reducing Memory Pitstops

Author: Ankit Sethia
D Anoushe Jamshidi
Scott Mahlke
Publication venue
Publication date: 23/04/2020
Field of study

Abstract-With the prevalence of GPUs as throughput engines for data parallel workloads, the landscape of GPU computing is changing significantly. Non-graphics workloads with high memory intensity and irregular access patterns are frequently targeted for acceleration on GPUs. While GPUs provide large numbers of compute resources, the resources needed for memory intensive workloads are more scarce. Therefore, managing access to these limited memory resources is a challenge for GPUs. We propose a novel Memory Aware Scheduling and Cache Access Re-execution (Mascar) system on GPUs tailored for better performance for memory intensive workloads. This scheme detects memory saturation and prioritizes memory requests among warps to enable better overlapping of compute and memory accesses. Furthermore, it enables limited re-execution of memory instructions to eliminate structural hazards in the memory subsystem and take advantage of cache locality in cases where requests cannot be sent to the memory due to memory saturation. Our results show that Mascar provides a 34% speedup over the baseline roundrobin scheduler and 10% speedup over the state of the art warp schedulers for memory intensive workloads. Mascar also achieves an average of 12% savings in energy for such workloads

CiteSeerX

An integrated adoption model for Islamic credit card: PLS-SEM based approach

Author: Hussin Nazimah
Jamshidi D. Anoushe
Publication venue: 'Emerald'
Publication date: 01/01/2018
Field of study

Purpose: One challenge when launching new banking services is to overcome resistance to change so as to accelerate market acceptance. This is the case of Islamic credit card (ICC). In response, this study aims to develop a conceptual framework that combines the innovation diffusion theory (IDT) and the theory of reasoned action (TRA) with religious obligation and customer awareness to explain behavior intention and usage behavior of ICC. Design/methodology/approach: To test the conceptual model, the data are collected from 649 bank customers in Malaysia, and the structural equation modeling technique is used to test the forecasting model. Findings: The study results support some relationships of IDT and TRA, such as relative advantage, compatibility, trialability, observability, attitude and also the customer awareness as a stronger predictor of intention of ICC. Originality/value: To the author’s knowledge, there have not been any attempt to develop a conceptual model for identifying the factors that influence the adoption of ICC by integrating IDT and TRA and other added measures. Therefore, this study adds to the body of knowledge regarding ICC adoption studies by extending the IDT in this domain and integrating it with TRA as two well-established models in the area of acceptance and usage behavior studies

Universiti Teknologi Malaysia Institutional Repository

COMET: Code Offload by Migrating Execution Transparently

Author: D. Anoushe
Jamshidi Scott
Mahlke Z. Morley
Mao Xu Chen
Mark S. Gordon
Publication venue
Publication date
Field of study

In this paper we introduce a runtime system to allow unmodified multi-threaded applications to use multiple machines. The system allows threads to migrate freely between machines depending on the workload. Our prototype, COMET (Code Offload by Migrating Execution Transparently), is a realization of this design built on top of the Dalvik Virtual Machine. COMET leverages the underlying memory model of our runtime to implement distributed shared memory (DSM) with as few interactions between machines as possible. Making use of a new VM-synchronization primitive, COMET imposes little restriction on when migration can occur. Additionally, enough information is maintained so one machine may resume computation after a network failure. We target our efforts towards augmenting smartphones or tablets with machines available in the network. We demonstrate the effectiveness of COMET on several real applications available on Google Play. These applications include image editors, turn-based games, a trip planner, and math tools. Utilizing a server-class machine, COMET can offer significant speed-ups on these real applications when run on a modern smartphone. With WiFi and 3G networks, we observe geometric mean speed-ups of 2.88X and 1.27X relative to the Dalvik interpreter across the set of applications with speed-ups as high as 15X on some applications.

CiteSeerX

Scaling Performance via Self-Tuning Approximation for Graphics Engines

Author: Amir Hormati
D. Anoushe Jamshidi
Frank Andrew
Janghaeng Lee
Lee Sang Ik
Mehrzad Samadi
NVIDIA.
Russell Stuart
Samadi Mehrzad
Scott Mahlke
Stratton John A.
Sujeeth Arvind K.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref