Search CORE

1,440 research outputs found

Synthesis of application specific processor architectures for ultra-low energy consumption

Author: Kazmierski T J
Leech Charles
Publication venue
Publication date: 12/02/2014
Field of study

In this paper we suggest that further energy savings can be achieved by a new approach to synthesis of embedded processor cores, where the architecture is tailored to the algorithms that the core executes. In the context of embedded processor synthesis, both single-core and many-core, the types of algorithms and demands on the execution efficiency are usually known at the chip design time. This knowledge can be utilised at the design stage to synthesise architectures optimised for energy consumption. Firstly, we present an overview of both traditional energy saving techniques and new developments in architectural approaches to energy-efficient processing. Secondly, we propose a picoMIPS architecture that serves as an architectural template for energy-efficient synthesis. As a case study, we show how the picoMIPS architecture can be tailored to an energy efficient execution of the DCT algorithm

Southampton (e-Prints Soton)

Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

Author: Catalán Sandra
Igual Francisco D.
Mayo Rafael
Quintana-Ortí Enrique S.
Rodríguez-Sánchez Rafael
Publication venue
Publication date: 30/06/2015
Field of study

Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware configuration as well as asymmetric--static and dynamic scheduling strategies that carefully tune and distribute the operation's micro-kernels among the big and LITTLE cores of the target processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy efficiency

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

Using the High Productivity Language Chapel to Target GPGPU Architectures

Author: Chamberlain Bradford L.
Garzaran Maria J.
Padua David
Sidelnik Albert
Publication venue
Publication date: 25/04/2011
Field of study

It has been widely shown that GPGPU architectures offer large performance gains compared to their traditional CPU counterparts for many applications. The downside to these architectures is that the current programming models present numerous challenges to the programmer: lower-level languages, explicit data movement, loss of portability, and challenges in performance optimization. In this paper, we present novel methods and compiler transformations that increase productivity by enabling users to easily program GPGPU architectures using the high productivity programming language Chapel. Rather than resorting to different parallel libraries or annotations for a given parallel platform, we leverage a language that has been designed from first principles to address the challenge of programming for parallelism and locality. This also has the advantage of being portable across distinct classes of parallel architectures, including desktop multicores, distributed memory clusters, large-scale shared memory, and now CPU-GPU hybrids. We present experimental results from the Parboil benchmark suite which demonstrate that codes written in Chapel achieve performance comparable to the original versions implemented in CUDA.NSF CCF 0702260Cray Inc. Cray-SRA-2010-016962010-2011 Nvidia Research Fellowshipunpublishednot peer reviewe

Illinois Digital Environment for Access to Learning and Scholarship Repository

A Survey of Prediction and Classification Techniques in Multicore Processor Systems

Author: Ababei Cristinel
Moghaddam Milad Ghorbani
Publication venue: e-Publications@Marquette
Publication date: 01/05/2019
Field of study

In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems

epublications@Marquette

Recommended from our members

A Dynamic Reconfiguration Framework to Maximize Performance/Power in Asymmetric Multicore Processors

Author: Annamalai Arunachalam
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2013
Field of study

Recent trends in technology scaling have shifted the processing paradigm to multicores. Depending on the characteristics of the cores, the multicores can be either symmetric or asymmetric. Prior research has shown that Asymmetric Multicore Processors (AMPs) outperform their symmetric (SMP) counterparts within a given resource and power budget. But, due to the heterogeneity in core-types and time-varying workload behavior, thread-to-core assignment is always a challenge in AMPs. As the computational requirements vary significantly across different applications and with time, there is a need to dynamically allocate appropriate computational resources on demand to suit the applications’ current needs, in order to maximize the performance and minimize the energy consumption. Performance/power of the applications could be further increased by dynamically adapting the voltage and frequency of the cores to better fit the changing characteristics of the workloads. Not only can a core be forced to a low power mode when its activity level is low, but the power saved by doing so could be opportunistically re-budgeted to the other cores to boost the overall system throughput. To this end, we propose a novel solution that seamlessly combines heterogeneity with a Dynamic Reconfiguration Framework (DRF). The proposed dynamic reconfiguration framework is equipped with Dynamic Resource Allocation (DRA) and Voltage/Frequency Adaptation (DVFA) capabilities to adapt the core resources and operating conditions at runtime to the changing demands of the applications. As a proof of concept, we illustrate our proposed approach using a dual-core AMP and demonstrate significant performance/power benefits over various baselines

ScholarWorks@UMass Amherst

RePP-C: runtime estimation of performance-power with workload consolidation in CMPs

Author: Martorell Bofill Xavier
Nishtala Rajiv
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Configuration of hardware knobs in multicore environments for meeting performance-power demands constitutes a desirable feature in modern data centers. At the same time, high energy efficiency (performance per watt) requires optimal thread-to-core assignment. In this paper, we present the runtime estimator (RePP-C) for performance-power, characterized by processor frequency states (P-states), a wide range of sleep intervals (Cl-states) and workload consolidation. We also present a schema for frequency and contention-aware thread-to-core assignment (FACTS) which considers various thread demands. The proposed solution (RePP-C) selects a given hardware configuration for each active core to ensure that the performance-power demands are satisfied while using the scheduling schema (FACTS) for mapping threads-to-cores. Our results show that FACTS improves over other state-of-the-art schedulers like Distributed Intensity Online (DIO) and native Linux scheduler by 8.25% and 37.56% in performance, with simultaneous improvement in energy efficiency by 6.2% and 14.17%, respectively. Moreover, we prove the usability of RePP-C by predicting performance and power for 7 different types of workloads and 10 different QoS targets. The results show an average error of 7.55% and 8.96% (with 95% confidence interval) when predicting energy and performance respectively.This work has been partially supported by the European Union FP7 program through the Mont-Blanc-2 project (FP7-ICT-610402), by the Ministerio de Economia y Competitividad under contract Computacion de Altas Prestaciones VII (TIN2015-65316-P), and the Departament d’Innovacio, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programacio i Entorns d’Execucio Paral.lels (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC