Search CORE

574 research outputs found

A trace-driven approach for fast and accurate simulation of manycore architectures

Author: Adeniyi-Jones Chris
Butko Anastasiia
Gamatié Abdoulaye
Garibotti Rafael
Lapotre Vianney
Ost Luciano
Sassatelli Gilles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/01/2015
Field of study

International audienceThe evolution of manycore sytems, forecasted to feature hundreds of cores by the end of the decade calls for efficient solutions for design space exploration and debugging. Among the relevant existing solutions the well-known gem5 simu-lator provides a rich architecture description framework. However , these features come at the price of prohibitive simulation time that limits the scope of possible explorations to configurations made of tens of cores. To address this limitation, this paper proposes a novel trace-driven simulation approach for efficient exploration of manycore architectures

Crossref

HAL-Université de Bretagne Occidentale

Exploiting memory allocations in clusterized many-core architectures

Author: Abdoulaye Gamatie (7214183)
Anastasiia Butko (7214180)
Gilles Sassatelli (7214186)
Ost Copello-Ost (4910230)
Rafael Garibotti (7214177)
Ricardo Reis (181591)
Publication venue
Publication date: 22/01/2019
Field of study

Power-efficient architectures have become the most important feature required for future embedded systems. Modern designs, like those released on mobile devices, reveal that clusterization is the way to improve energy efficiency. However, such architectures are still limited by the memory subsystem (i.e., memory latency problems). This work investigates an alternative approach that exploits on-chip data locality to a large extent, through distributed shared memory systems that permit efficient reuse of on-chip mapped data in clusterized many-core architectures. First, this work reviews the current literature on memory allocations and explore the limitations of cluster-based many-core architectures. Then, several memory allocations are introduced and benchmarked scalability, performance and energy-wise, compared to the conventional centralized shared memory solution to reveal which memory allocation is the most appropriate for future mobile architectures. Our results show that distributed shared memory allocations bring performance gains and opportunities to reduce energy consumption

Loughborough University Institutional Repository

An Overview of Chip Multi-Processors Simulators Technology

Author: A.-M. M. C. Z. “A Survey of Computer System Architecture Simulators, Case Study: Sniper " in APCASE
D Sanchez
J Chen
JH Ahn
M Rosenblum
PS Magnusson
SS Mukherjee
T Austin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Computer System Architecture (CSA) simulators are generally used to develop and validate new CSA designs and developments. The goal of this paper is to provide an insight into the importance of CSA simulation and the possible criteria that differentiate between various CSA simulators. Multi-dimensional aspects determine the taxonomy of CSA simulators including their accuracy, performance, functionality and flexibility. The Sniper simulator has been selected for a closer look and testing. The Sniper proofs its ability to scale to hundred cores with a wide range of functionality and performance. © Springer International Publishing Switzerland 2015

Crossref

OPUS - University of Technology Sydney

RELEASE: A High-level Paradigm for Reliable Large-scale Server Software

Author: Chechina Natalia
Trinder Phil
Publication venue
Publication date: 01/01/2012
Field of study

Erlang is a functional language with a much-emulated model for building reliable distributed systems. This paper outlines the RELEASE project, and describes the progress in the rst six months. The project aim is to scale the Erlang's radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines. Currently Erlang has inherently scalable computation and reliability models, but in practice scalability is constrained by aspects of the language and virtual machine. We are working at three levels to address these challenges: evolving the Erlang virtual machine so that it can work effectively on large scale multicore systems; evolving the language to Scalable Distributed (SD) Erlang; developing a scalable Erlang infrastructure to integrate multiple, heterogeneous clusters. We are also developing state of the art tools that allow programmers to understand the behaviour of massively parallel SD Erlang programs. We will demonstrate the e ectiveness of the RELEASE approach using demonstrators and two large case studies on a Blue Gene

Enlighten

ElasticSimMATE: a Fast and Accurate gem5 Trace-Driven Simulator for Multicore Systems

Author: Bruguier Florent
Gamatié Abdoulaye
Nocua Alejandro
Sassatelli Gilles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/07/2017
Field of study

International audienceMulticore system analysis requires efficient solutions for architectural parameter and scalability exploration. Long simulation time is the main drawback of current simulation approaches. In order to reduce the simulation time while keeping the accuracy levels, trace-driven simulation approaches have been developed. However, existing approaches do not allow multicore exploration or do not capture the behavior of multi-threaded programs. Based on the gem5 simulator, we developed a novel synchronization mechanism for multicore analysis based on the trace collection of synchronization events, instruction and dependencies. It allows efficient architectural parameter and scalability exploration with acceptable simulation speed and accuracy

Crossref

HAL Descartes

Hal-Diderot

Scalability of parallel video decoding on heterogeneous manycore architectures

Author: Cabarcas Jaramillo Felipe
Juurlink Ben
Meenderinck Cor
Ramírez Bellido Alejandro
Valero Cortés Mateo
Álvarez Mesa Mauricio
Publication venue
Publication date: 01/01/2011
Field of study

This paper presents an analysis of the scalability of the parallel video decoding on heterogeneous many core architectures. As benchmark, we use a highly parallel H.264/AVC video decoder that generates a large number of independent tasks. In order to translate task-level parallelism into performance gains both the video decoder and the architecture have been optimized. The video decoder was modified for exploiting coarse-grain frame-level parallelism in the entropy decoding kernel which has been considered the main bottleneck. Second, a heterogeneous combination of cores is evaluated for executing different type of tasks. Finally, an evaluation of the memory requirements of the whole system has been carried out. Experiments conducted using a trace-driven simulation methodology shows that the evaluated system exhibits a good parallel scalability up to 68 cores. At this point the parallel video decoder is able to decode more than 200 HD frames per second using simple low power processors.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A trace-driven methodology to evaluate memory management services of distributed operating systems for lightweight manycores

Author: Podestá Junior Emmanuel
Publication venue
Publication date: 01/01/2022
Field of study

Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2022.Os lightweight manycores pertencem a uma nova classe de processadores emergentes de baixa potência para a era Exascale. Esses processadores apresentam vários desafios para o desenvolvimento de aplicações, como arquitetura de memória distribuída, quantidade limitada de memória no chip e nenhuma coerência de cache. Recentemente, Sistemas Operacionais distribuídos foram propostos para enfrentar esses desafios de forma transparente. Nesses sistemas, diferentes serviços do Sistema Operacional são implantados nos núcleos do processador, sendo o serviço de gerenciamento de memória um dos mais importantes. No entanto, os desafios citados anteriormente sobre lightweight manycores trazem vários obstáculos para o design, implementação e otimizações futuras de serviços de gerenciamento de memória. Esta dissertação propõe uma metodologia baseada em traces para avaliar e otimizar recursos do serviço de gerenciamento de memória em Sistemas Operacionais distribuídos para lightweight manycores. Usando uma representação compacta do padrão de acesso às páginas das aplicações, a metodologia consegue imitar o padrão de acesso à memória das aplicações originais no Sistema Operacional distribuído rodando em um lightweight manycore. A metodologia foi integrada em um Sistema Operacional distribuído (Nanvix) e validada usando cinco aplicações de um benchmark específico para lightweight manycores (Capbench). Em seguida, a metodologia foi aplicada para realizar um estudo de caso usando uma implementação de cache gerenciada por software disponível no Nanvix. A metodologia permitiu avaliar várias configurações e diferentes políticas de substituição de páginas no processador MPPA, mesmo sem o suporte necessário da arquitetura para implementá-los.Abstract: Lightweight manycores belong to a new class of emerging low-power processors for the Exascale era. These processors present several challenges for the development of applications, such as distributed memory architecture, limited amount of on-chip memory and no cache coherence. Recently, distributed Operating Systems have been proposed to address these challenges in a transparent way. In these systems, different Operating Systems services are deployed across the processor cores, being the memory management service one of the most important. However, the aforementioned challenges of lightweight manycores bring several demands to the design, implementation and future optimizations of memory management services. This dissertation proposes a trace-driven methodology to evaluate and optimize features of a memory management service of distributed Operating Systems for lightweight manycores. By using a compact representation of the page access pattern of applications, our methodology is capable of mimicking the memory access pattern of the original applications on the target distributed Operating System running on a lightweight manycore. The methodology was integrated in a distributed Operating System (Nanvix) and validated using five applications from a specific benchmark for lightweight manycores (Capbench). Then, the methodology was applied to carry out a case study using a software-managed cache implementation available in Nanvix. The methodology enables evaluation of several configurations and different page replacement policies on MPPA processor, even without the support from the architecture to implement them

Repositório Institucional da UFSC

Component Assemblies in the Context of Manycore

Author: Basu Ananda
Bensalem Saddek
Bourgos Paraskevas
Bozga Marius
Maheshwari Mayur
Sifakis Joseph
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/10/2011
Field of study

International audienceWe present a component-based software design flow for building parallel applications running on top of manycore platforms. The flow is based on the BIP - Behaviour, Interaction, Priority - component frameworkand its associated toolbox. It provides full support for modeling of application software, validation of its functional correctness, modeling and performance analysis on system-level models, code generation and deployment on target manycore platforms. The paper details some of the steps of the design flow. The design flow is illustrated through the modeling and deployment of two applications, the Cholesky factorization and the MJPEG decoding on MPARM, an ARM-based manycore platform. We emphasize the merits of the design flow, notably fast performance analysis as well as code generation and effi cient deployment on manycore platforms

Hal - Université Grenoble Alpes

HETSIM: Simulating Large-Scale Heterogeneous Systems using a Trace-driven, Synchronization and Dependency-Aware Framework

Author: Cole Murray
Dreslinski Ronald
Kaszyk Kuba
O'Boyle Michael F P
Pal Subhankar
Publication venue
Publication date: 12/08/2020
Field of study

Edinburgh Research Explorer