Search CORE

583 research outputs found

TANGO: Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation

Author: Armstrong Django
Badia Rosa
Deprez Jean-Christophe
Djemame Karim
Ejarque Jorge
Ferrer Ana Juan
Georgiou Yiannis
Kavanagh Richard
Perez David Garcia
Sirvent Raul
Publication venue
Publication date: 04/03/2016
Field of study

The paper is concerned with the issue of how software systems actually use Heterogeneous Parallel Architectures (HPAs), with the goal of optimizing power consumption on these resources. It argues the need for novel methods and tools to support software developers aiming to optimise power consumption resulting from designing, developing, deploying and running software on HPAs, while maintaining other quality aspects of software to adequate and agreed levels. To do so, a reference architecture to support energy efficiency at application construction, deployment, and operation is discussed, as well as its implementation and evaluation plans.Comment: Part of the Program Transformation for Programmability in Heterogeneous Architectures (PROHA) workshop, Barcelona, Spain, 12th March 2016, 7 pages, LaTeX, 3 PNG figure

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

A Survey of Phase Classification Techniques for Characterizing Variable Application Behavior

Author: Adegbija Tosiron
Criswell Keeley
Publication venue
Publication date: 16/07/2019
Field of study

Adaptable computing is an increasingly important paradigm that specializes system resources to variable application requirements, environmental conditions, or user requirements. Adapting computing resources to variable application requirements (or application phases) is otherwise known as phase-based optimization. Phase-based optimization takes advantage of application phases, or execution intervals of an application, that behave similarly, to enable effective and beneficial adaptability. In order for phase-based optimization to be effective, the phases must first be classified to determine when application phases begin and end, and ensure that system resources are accurately specialized. In this paper, we present a survey of phase classification techniques that have been proposed to exploit the advantages of adaptable computing through phase-based optimization. We focus on recent techniques and classify these techniques with respect to several factors in order to highlight their similarities and differences. We divide the techniques by their major defining characteristics---online/offline and serial/parallel. In addition, we discuss other characteristics such as prediction and detection techniques, the characteristics used for prediction, interval type, etc. We also identify gaps in the state-of-the-art and discuss future research directions to enable and fully exploit the benefits of adaptable computing.Comment: To appear in IEEE Transactions on Parallel and Distributed Systems (TPDS

arXiv.org e-Print Archive

Android Application Development for the Intel Platform

Author: Cohen Ryan
Wang Tao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Computer scienc

OAPEN Library

Arbre jeràrquic i gràfic de tesis doctorals dirigides pel Professor Eduard Ayguadé Parra

Author: Biblioteca Rector Gabriel Ferraté
Publication venue: Universitat Politècnica de Catalunya
Publication date: 21/03/2023
Field of study

Aquest informe mostra les 30 tesis doctorals dirigides per Eduard Ayguadé Parra, així com les tesis doctorals dirigides pels investigadors que han tingut a Eduard Ayguadé Parra com a director de tesis.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Cache Equalizer: A Cache Pressure Aware Block Placement Scheme for Large-Scale Chip Multiprocessors

Author: Cho Sangyeun
Hammoud Mohammad
Melhem Rami
Publication venue: Department of Computer Science, University of Pittsburgh
Publication date: 01/01/2009
Field of study

This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large scale chip multiprocessors (CMPs). Our work is motivated by large asymmetry in cache sets usages. CE decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. Temporal pressure at the on-chip last-level cache, is continuously collected at a group (comprised of cache sets) granularity, and periodically recorded at the memory controller to guide the placement process. An incoming block is consequently placed at a cache group that exhibits the minimum pressure. CE provides Quality of Service (QoS) by robustly offering better performance than the baseline shared NUCA cache. Simulation results using a full-system simulator demonstrate that CE outperforms shared NUCA caches by an average of 15.5% and by as much as 28.5% for the benchmark programs we examined. Furthermore, evaluations manifested the outperformance of CE versus related CMP cache designs

D-Scholarship@Pitt

Influence of shortest path algorithms on energy consumption of multi-core processors

Author: Karasik O. N.
Prihozhy A. A.
Publication venue: БНТУ
Publication date
Field of study

Modern multi-core processors, operating systems and applied software are being designed towards energy efficiency, which significantly reduces energy consumption. Energy efficiency of software depends on algorithms it implements, and, on the way, it exploits hardware resources. In the paper, we consider sequential and parallel implementations of four algorithms of shortest paths search in dense weighted graphs, measure and analyze their runtime, energy consumption, performance states and operating frequency of the Intel Core i7-10700 8-core processor. Our goal is to find out how each of the algorithms influences the processor energy consumption, how the processor and operating system analyze the workload and take actions to increase or reduce operating frequency and to disable cores, and which algorithms are preferable for exploiting in sequential and parallel modes. The graph extension-based algorithm (GEA) appeared to be the most energy efficient among algorithms implemented sequentially. The classical Floyd-Warshall algorithm (FW) consumed up to twice as much energy, and the blocked homogeneous (BFW) and heterogeneous (HBFW) algorithms consumed up to 52.2 % and 21.2 % more energy than GEA. Parallel implementations of BFW and HBFW are faster by up to 4.41 times and more energy efficient by up to 3.23 times than the parallel implementation of FW and consume less energy by up to 2.22 times than their sequential counterparts. The sequential GEA algorithm consumes less energy than the parallel FW, although it loses FW in runtime. The multi-core processor runs FW with an average frequency of 4235 MHz and runs BFW and HBFW with lower frequency of 4059 MHz and 4035 MHz respectively

Repository of Belarusian National Technical University (BNTU)

Android on x86

Author: Cummings Darren
Krajci Iggy
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Computer scienc

OAPEN Library

Revisiting Actor Programming in C++

Author: Charousset Dominik
Hiesgen Raphael
Schmidt Thomas C.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

The actor model of computation has gained significant popularity over the last decade. Its high level of abstraction makes it appealing for concurrent applications in parallel and distributed systems. However, designing a real-world actor framework that subsumes full scalability, strong reliability, and high resource efficiency requires many conceptual and algorithmic additives to the original model. In this paper, we report on designing and building CAF, the "C++ Actor Framework". CAF targets at providing a concurrent and distributed native environment for scaling up to very large, high-performance applications, and equally well down to small constrained systems. We present the key specifications and design concepts---in particular a message-transparent architecture, type-safe message interfaces, and pattern matching facilities---that make native actors a viable approach for many robust, elastic, and highly distributed developments. We demonstrate the feasibility of CAF in three scenarios: first for elastic, upscaling environments, second for including heterogeneous hardware like GPGPUs, and third for distributed runtime systems. Extensive performance evaluations indicate ideal runtime behaviour for up to 64 cores at very low memory footprint, or in the presence of GPUs. In these tests, CAF continuously outperforms the competing actor environments Erlang, Charm++, SalsaLite, Scala, ActorFoundry, and even the OpenMPI.Comment: 33 page

arXiv.org e-Print Archive

REPOSIT

An integrated soft- and hard-programmable multithreaded architecture

Author: Zhong Shi
Publication venue: The University of Edinburgh
Publication date: 01/01/2007
Field of study

Edinburgh Research Archive

Coordinating Resource Use in Open Distributed Systems

Author: Zhao Xinghui
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

In an open distributed system, computational resources are peer-owned, and distributed over time and space. The system is open to interactions with its environment, and the resources can dynamically join or leave the system, or can be discovered at runtime. This dynamicity leads to opportunities to carry out computations without statically owned resources, harnessing the collective compute power of the resources connected by the Internet. However, realizing this potential requires efficient and scalable resource discovery, coordination, and control, which present challenges in a dynamic, open environment. In this thesis, I present an approach to address these challenges by separating the functionality concerns of concurrent computations from those of coordinating their resource use, with the purpose of reducing programming complexity, and aiding development of correct, efficient, and resource-aware concurrent programs. As a first step towards effectively coordinating distributed resources, I developed DREAM, a Distributed Resource Estimation and Allocation Model, which enables computations to reason about future availability of resources. I then developed a fine-grained resource coordination scheme for distributed computations. The coordination scheme integrates DREAM-based resource reasoning into a distributed scheduler, for deciding and enforcing fine-grained resource-use schedules for distributed computations. To control the overhead caused by the coordination, a tuner is implemented which explicitly balances the overhead of the control mechanisms against the extent of control exercised. The effectiveness and performance of the resource coordination approach have been evaluated using a number of case studies. Experimental results show that the approach can effectively schedule computations for supporting various types of coordination objectives, such as ensuring Quality-of-Service, power-efficient execution, and dynamic load balancing. The overhead caused by the coordination mechanism is relatively modest, and adjustable through the tuner. In addition, the coordination mechanism does not add extra programming complexity to computations

eCommons@USASK

University of Saskatchewan Research Archive