Search CORE

1,526 research outputs found

HeteroCore GPU to exploit TLP-resource diversity

Author: Eeckhout Lieven
Wang Zhiying
Zhao Xia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Author: Mittal Sparsh
Publication venue
Publication date: 01/01/2014
Field of study

Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

arXiv.org e-Print Archive

Crossref

Classification-driven search for effective sm partitioning in multitasking GPUs

Author: Awatramani M.
Grauer-Gray S.
Jadidi A.
Li X.
Suzuki Y.
Vijaykumar N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Graphics processing units (GPUs) feature an increasing number of streaming multiprocessors (SMs) with each successive generation. At the same time, GPUs are increasingly widely adopted in cloud services and data centers to accelerate general-purpose workloads. Running multiple applications on a GPU in such environments requires effective multitasking support. Spatial multitasking in which independent applications co-execute on different sets of SMs is a promising solution to share GPU resources. Unfortunately, how to effectively partition SMs is an open problem. In this paper, we observe that compared to widely-used even partitioning, dynamic SM partitioning based on the characteristics of the co-executing applications can significantly improve performance and power efficiency. Unfortunately, finding an effective SM partition is challenging because the number of possible combinations increases exponentially with the number of SMs and co-executing applications. Through offline analysis, we find that first classifying workloads, and then searching an effective SM partition based on the workload characteristics can significantly reduce the search space, making dynamic SM partitioning tractable. Based on these insights, we propose Classification-Driven search (CD-search) for low-overhead dynamic SM partitioning in multitasking GPUs. CD-search first classifies workloads using a novel off-SM bandwidth model, after which it enters the performance mode or power mode depending on the workload's characteristics. Both modes follow a specific search strategy to quickly determine the optimum SM partition. Our evaluation shows that CD-search improves system throughput by 10.4% on average (and up to 62.9%) over even partitioning for workloads that are classified for the performance mode. For workloads classified for the power mode, CD-search reduces power consumption by 25% on average (and up to 41.2%). CD-search incurs limited runtime overhead

Crossref

Ghent University Academic Bibliography

A Survey on Cache Management Mechanisms for Real-Time Embedded Systems

Author: ALHAMMAD AHMED
FRÖHLICH ANTÔNIO
GRACIOLI GIOVANI
MANCUSO RENATO
Pellizzoni Rodolfo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2015
Field of study

© ACM, 2015. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Computing Surveys, {48, 2, (November 2015)} http://doi.acm.org/10.1145/2830555Multicore processors are being extensively used by real-time systems, mainly because of their demand for increased computing power. However, multicore processors have shared resources that affect the predictability of real-time systems, which is the key to correctly estimate the worst-case execution time of tasks. One of the main factors for unpredictability in a multicore processor is the cache memory hierarchy. Recently, many research works have proposed different techniques to deal with caches in multicore processors in the context of real-time systems. Nevertheless, a review and categorization of these techniques is still an open topic and would be very useful for the real-time community. In this article, we present a survey of cache management techniques for real-time embedded systems, from the first studies of the field in 1990 up to the latest research published in 2014. We categorize the main research works and provide a detailed comparison in terms of similarities and differences. We also identify key challenges and discuss future research directions.King Saud University NSER

University of Waterloo's Institutional Repository

Crossref