Search CORE

15 research outputs found

On the Performance and Isolation of Asymmetric Microkernel Design for Lightweight Manycores

Author: Broquedis François
Castro Márcio
Freitas Henrique,
Lima Davidson,
Mehaut Jean-François
Penna Pedro Henrique
Souto João
Publication venue: HAL CCSD
Publication date: 19/11/2019
Field of study

International audienc

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Development of Energy Models for Design Space Exploration of Embedded Many-Core Systems

Author: Ax Johannes
Flasskamp Martin
Jungeblut Thorsten
Kelly Wayne
Klarhorst Christian
Porrmann Mario
Rückert Ulrich
Publication venue
Publication date: 01/01/2018
Field of study

This paper introduces a methodology to develop energy models for the design space exploration of embedded many-core systems. The design process of such systems can benefit from sophisticated models. Software and hardware can be specifically optimized based on comprehensive knowledge about application scenario and hardware behavior. The contribution of our work is an automated framework to estimate the energy consumption at an arbitrary abstraction level without the need to provide further information about the system. We validated our framework with the configurable many-core system CoreVA-MPSoC. Compared to a simulation of the CoreVA-MPSoC on gate level in a 28nm FD-SOI standard cell technology, our framework shows an average estimation error of about 4%.Comment: Presented at HIP3ES, 201

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

Energy Efficient Parallel K-Means Clustering for an Intel Hybrid Multi-Chip Package

Author: Freitas Henrique
Maciel Lucas
Penna Pedro Henrique
Souza Matheus
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/09/2018
Field of study

International audienceFPGA devices have been proving to be good candidates to accelerate applications from different research topics. For instance, machine learning applications such as K-Means clustering usually relies on large amount of data to be processed, and, despite the performance offered by other architectures, FPGAs can offer better energy efficiency. With that in mind, Intel ® has launched a platform that integrates a multicore and an FPGA in the same package, enabling low latency and coherent fine-grained data offload. In this paper, we present a parallel implementation of the K-Means clustering algorithm, for this novel platform, using OpenCL language, and compared it against other platforms. We found that the CPU+FPGA platform was more energy efficient than the CPU-only approach from 70.71% to 85.92%, with Standard and Tiny input sizes respectively, and up to 68.21% of performance improvement was obtained with Tiny input size. Furthermore, it was up to 7.2× more energy efficient than an Intel® Xeon Phi ™, 21.5× than a cluster of Raspberry Pi boards, and 3.8× than the low-power MPPA-256 architecture, when the Standard input size was used

Crossref

Hal - Université Grenoble Alpes

CAP Bench: a benchmark suite for performance and energy evaluation of low-power many-core processors

Author: Bailey
Bergman
Binkert
Bjerregaard
Francesquini
Henning
Ho
Jeffers
Kanungo
Mottin
Padoin
Shalf
Simon
Woo
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

International audienceSUMMARY The constant need for faster and more energy-efficient processors has been stimulating the development of new architectures, such as low-power many-core architectures. Researchers aiming to study these architectures are challenged by peculiar characteristics of some components such as Networks-on-Chip and lack of specific tools to evaluate their performance. In this context, the goal of this paper is to present a benchmark suite to evaluate state-of-the-art low-power many-core architectures such as the Kalray MPPA-256 low-power processor, which features 256 compute cores in a single chip. The benchmark was designed and used to highlight important aspects and details that need to be considered when developing parallel applications for emerging low-power many-core architectures. As a result, this paper demonstrates that the benchmark offers a diverse suite of programs with regard to parallel patterns, job types, communication intensity and task load strategies, suitable for a broad understanding of performance and energy consumption of MPPA-256 and upcoming many-core architectures

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Many-Core Scheduling of Data Parallel Applications Using SMT Solvers

Author: Ioannis Galanommatis
Oded Maler
Peter Poplavko
Pranav Tendulkar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Abstract—To program recently developed many-core systems-on-chip two traditionally separate performance optimization problems have to be solved together. Firstly, it is the parallel scheduling on a shared-memory multi-core system. Secondly, it is the co-scheduling of network communication and processor computation. This is because many-core systems are networks of multi-core clusters. In this paper, we demonstrate the applicabil-ity of modern constraint solvers to efficiently schedule parallel applications on many-cores and validate the results by running benchmarks on a real many-core platform. Index Terms—task graph, scheduling, multiprocessor, DMA I

CiteSeerX

Crossref

Bridging Theory and Practice in Cache Replacement

Author: Beckmann Nathan
Sanchez Daniel
Publication venue
Publication date: 21/12/2015
Field of study

Much prior work has studied processor cache replacement policies, but a large gap remains between theory and practice. The optimal policy (MIN) requires unobtainable knowledge of the future, and prior theoretically-grounded policies use reference models that do not match real programs. Meanwhile, practical policies are designed empirically. Lacking a strong theoretical foundation, they do not make the best use of the information available to them. This paper bridges theory and practice. We propose that practical policies should replace lines based on their economic value added (EVA), the difference of their expected hits from the average. We use Markov decision processes to show that EVA is optimal under some reasonable simplifications. We present an inexpensive, practical implementation of EVA and evaluate it exhaustively over many cache sizes. EVA outperforms prior practical policies and saves area at iso-performance. These results show that formalizing cache replacement yields practical benefits

DSpace@MIT

VThreads: A novel VLIW chip multiprocessor with hardware-assisted PThreads

Author: David Stevens (4254274)
Vassilios Chouliaras (1251600)
Vincent Dwyer (1251447)
Publication venue
Publication date: 01/01/2016
Field of study

We discuss VThreads, a novel VLIW CMP with hardware-assisted shared-memory Thread support. VThreads supports Instruction Level Parallelism via static multiple-issue and Thread Level Parallelism via hardware-assisted POSIX Threads along with extensive customization. It allows the instantiation of tightlycoupled streaming accelerators and supports up to 7-address Multiple-Input, Multiple-Output instruction extensions. VThreads is designed in technology-independent Register-Transfer-Level VHDL and prototyped on 40 nm and 28 nm Field-Programmable gate arrays. It was evaluated against a PThreads-based multiprocessor based on the Sparc-V8 ISA. On a 65 nm ASIC implementation VThreads achieves up to x7.2 performance increase on synthetic benchmarks, x5 on a parallel Mandelbrot implementation, 66% better on a threaded JPEG implementation, 79% better on an edge-detection benchmark and ~13% improvement on DES compared to the Leon3MP CMP. In the range of 2 to 8 cores VThreads demonstrates a post-route (statistical) power reduction between 65% to 57% at an area increase of 1.2%-10% for 1-8 cores, compared to a similarly-configured Leon3MP CMP. This combination of micro-architectural features, scalability, extensibility, hardware support for low-latency PThreads, power efficiency and area make the processor an attractive proposition for low-power, deeply-embedded applications requiring minimum OS support

Loughborough University Institutional Repository

VThreads: A novel VLIW chip multiprocessor with hardware-assisted PThreads

Author: Agron
Andrews
Arvind
Brodersen
Chouliaras
Chouliaras
Chouliaras
Chouliaras
Colwell
Cong
D. Stevens
de Dinechin
De Micheli
Faraboschi
Gupta
Hubener
Kathail
Lin
Lin
Lübbers
Mandelbrot
Milward
Muck
Oliveira
Owaida
Papakonstantinou
Robert Thomson
Rooholamin
Schlansker
Stevens
Stevens
Thomson
Tullsen
V.A. Chouliaras
V.M. Dwyer
Villarreal
Watson
Windh
Ziavras
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

This paper was accepted for publication in the journal Microprocessors and Microsystems and the definitive published version is available at http://dx.doi.org/10.1016/j.micpro.2016.07.010.We discuss VThreads, a novel VLIW CMP with hardware-assisted shared-memory Thread support. VThreads supports Instruction Level Parallelism via static multiple-issue and Thread Level Parallelism via hardware-assisted POSIX Threads along with extensive customization. It allows the instantiation of tightlycoupled streaming accelerators and supports up to 7-address Multiple-Input, Multiple-Output instruction extensions. VThreads is designed in technology-independent Register-Transfer-Level VHDL and prototyped on 40 nm and 28 nm Field-Programmable gate arrays. It was evaluated against a PThreads-based multiprocessor based on the Sparc-V8 ISA. On a 65 nm ASIC implementation VThreads achieves up to x7.2 performance increase on synthetic benchmarks, x5 on a parallel Mandelbrot implementation, 66% better on a threaded JPEG implementation, 79% better on an edge-detection benchmark and ~13% improvement on DES compared to the Leon3MP CMP. In the range of 2 to 8 cores VThreads demonstrates a post-route (statistical) power reduction between 65% to 57% at an area increase of 1.2%-10% for 1-8 cores, compared to a similarly-configured Leon3MP CMP. This combination of micro-architectural features, scalability, extensibility, hardware support for low-latency PThreads, power efficiency and area make the processor an attractive proposition for low-power, deeply-embedded applications requiring minimum OS support

Crossref

Loughborough University Institutional Repository