14,406 research outputs found
Exploring Application Performance on Emerging Hybrid-Memory Supercomputers
Next-generation supercomputers will feature more hierarchical and
heterogeneous memory systems with different memory technologies working
side-by-side. A critical question is whether at large scale existing HPC
applications and emerging data-analytics workloads will have performance
improvement or degradation on these systems. We propose a systematic and fair
methodology to identify the trend of application performance on emerging
hybrid-memory systems. We model the memory system of next-generation
supercomputers as a combination of "fast" and "slow" memories. We then analyze
performance and dynamic execution characteristics of a variety of workloads,
from traditional scientific applications to emerging data analytics to compare
traditional and hybrid-memory systems. Our results show that data analytics
applications can clearly benefit from the new system design, especially at
large scale. Moreover, hybrid-memory systems do not penalize traditional
scientific applications, which may also show performance improvement.Comment: 18th International Conference on High Performance Computing and
Communications, IEEE, 201
Adjacent LSTM-Based Page Scheduling for Hybrid DRAM/NVM Memory Systems
Recent advances in memory technologies have led to the rapid growth of hybrid systems that combine traditional DRAM and Non Volatile Memory (NVM) technologies, as the latter provide lower cost per byte, low leakage power and larger capacities than DRAM, while they can guarantee comparable access latency. Such kind of heterogeneous memory systems impose new challenges in terms of page placement and migration among the alternative technologies of the heterogeneous memory system. In this paper, we present a novel approach for efficient page placement on heterogeneous DRAM/NVM systems. We design an adjacent LSTM-based approach for page placement, which strongly relies on page accesses prediction, while sharing knowledge among pages with behavioral similarity. The proposed approach leads up to 65.5% optimized performance compared to existing approaches, while achieving near-optimal results and saving 20.2% energy consumption on average. Moreover, we propose a new page replacement policy, namely clustered-LRU, achieving up to 8.1% optimized performance, compared to the default Least Recently Used (LRU) policy
HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA
Heterogeneous embedded systems on chip (HESoCs) co-integrate a standard host
processor with programmable manycore accelerators (PMCAs) to combine
general-purpose computing with domain-specific, efficient processing
capabilities. While leading companies successfully advance their HESoC
products, research lags behind due to the challenges of building a prototyping
platform that unites an industry-standard host processor with an open research
PMCA architecture. In this work we introduce HERO, an FPGA-based research
platform that combines a PMCA composed of clusters of RISC-V cores, implemented
as soft cores on an FPGA fabric, with a hard ARM Cortex-A multicore host
processor. The PMCA architecture mapped on the FPGA is silicon-proven,
scalable, configurable, and fully modifiable. HERO includes a complete software
stack that consists of a heterogeneous cross-compilation toolchain with support
for OpenMP accelerator programming, a Linux driver, and runtime libraries for
both host and PMCA. HERO is designed to facilitate rapid exploration on all
software and hardware layers: run-time behavior can be accurately analyzed by
tracing events, and modifications can be validated through fully automated hard
ware and software builds and executed tests. We demonstrate the usefulness of
HERO by means of case studies from our research
Placement of dynamic data objects over heterogeneous memory organizations in embedded systems
Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadoras y Automática, leÃda el 24-11-2015Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu
WCET-Directed Dynamic Scratchpad Memory Allocation of Data
Many embedded systems feature processors coupled with a small and fast scratchpad memory. To the difference with caches, allocation of data to scratchpad memory must be handled by software. The major gain is to enhance the pre-dictability of memory accesses latencies. A compile-time dy-namic allocation approach enables eviction and placement of data to the scratchpad memory at runtime. Previous dynamic scratchpad memory allocation ap-proaches aimed to reduce average-case program execution time or the energy consumption due to memory accesses. For real-time systems, worst-case execution time is the main metric to optimize. In this paper, we propose a WCET-directed algorithm to dynamically allocate static data and stack data of a pro-gram to scratchpad memory. The granularity of placement of memory transfers (e.g. on function, basic block bound-aries) is discussed from the perspective of its computation complexity and the quality of allocation. 1
Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)
ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability
- …