4 research outputs found

    Mascar: Speeding up GPU Warps by Reducing Memory Pitstops

    Get PDF
    Abstract-With the prevalence of GPUs as throughput engines for data parallel workloads, the landscape of GPU computing is changing significantly. Non-graphics workloads with high memory intensity and irregular access patterns are frequently targeted for acceleration on GPUs. While GPUs provide large numbers of compute resources, the resources needed for memory intensive workloads are more scarce. Therefore, managing access to these limited memory resources is a challenge for GPUs. We propose a novel Memory Aware Scheduling and Cache Access Re-execution (Mascar) system on GPUs tailored for better performance for memory intensive workloads. This scheme detects memory saturation and prioritizes memory requests among warps to enable better overlapping of compute and memory accesses. Furthermore, it enables limited re-execution of memory instructions to eliminate structural hazards in the memory subsystem and take advantage of cache locality in cases where requests cannot be sent to the memory due to memory saturation. Our results show that Mascar provides a 34% speedup over the baseline roundrobin scheduler and 10% speedup over the state of the art warp schedulers for memory intensive workloads. Mascar also achieves an average of 12% savings in energy for such workloads

    An integrated adoption model for Islamic credit card: PLS-SEM based approach

    No full text
    Purpose: One challenge when launching new banking services is to overcome resistance to change so as to accelerate market acceptance. This is the case of Islamic credit card (ICC). In response, this study aims to develop a conceptual framework that combines the innovation diffusion theory (IDT) and the theory of reasoned action (TRA) with religious obligation and customer awareness to explain behavior intention and usage behavior of ICC. Design/methodology/approach: To test the conceptual model, the data are collected from 649 bank customers in Malaysia, and the structural equation modeling technique is used to test the forecasting model. Findings: The study results support some relationships of IDT and TRA, such as relative advantage, compatibility, trialability, observability, attitude and also the customer awareness as a stronger predictor of intention of ICC. Originality/value: To the author’s knowledge, there have not been any attempt to develop a conceptual model for identifying the factors that influence the adoption of ICC by integrating IDT and TRA and other added measures. Therefore, this study adds to the body of knowledge regarding ICC adoption studies by extending the IDT in this domain and integrating it with TRA as two well-established models in the area of acceptance and usage behavior studies

    COMET: Code Offload by Migrating Execution Transparently

    No full text
    In this paper we introduce a runtime system to allow unmodified multi-threaded applications to use multiple machines. The system allows threads to migrate freely between machines depending on the workload. Our prototype, COMET (Code Offload by Migrating Execution Transparently), is a realization of this design built on top of the Dalvik Virtual Machine. COMET leverages the underlying memory model of our runtime to implement distributed shared memory (DSM) with as few interactions between machines as possible. Making use of a new VM-synchronization primitive, COMET imposes little restriction on when migration can occur. Additionally, enough information is maintained so one machine may resume computation after a network failure. We target our efforts towards augmenting smartphones or tablets with machines available in the network. We demonstrate the effectiveness of COMET on several real applications available on Google Play. These applications include image editors, turn-based games, a trip planner, and math tools. Utilizing a server-class machine, COMET can offer significant speed-ups on these real applications when run on a modern smartphone. With WiFi and 3G networks, we observe geometric mean speed-ups of 2.88X and 1.27X relative to the Dalvik interpreter across the set of applications with speed-ups as high as 15X on some applications.
    corecore