Search CORE

86 research outputs found

A Run-Time System for Partially Reconfigurable FPGAs: The case of STMicroelectronics SPEAr board

Author: Charitopoulos George
Papadimitriou Kyprianos
Pau Danillo
Pnevmatikatos Dionisios
Santambrogio MARCO DOMENICO
Publication venue: 'IOS Press'
Publication date: 01/01/2016
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

The FASTER vision for designing dynamically reconfigurable systems

Author: Papadimitriou Kyprianos
Pilato Christiano
Pnevmatikatos Dionisios
Santambrogio Marco D
Sciuto Donatella
Stroobandt Dirk
Publication venue
Publication date: 01/01/2013
Field of study

Extending product functionality and lifetime requires constant addition of new features to satisfy the growing customer needs and the evolving market and technology trends. software component adaptivity is straightforward but not enough: recent products include hardware accelerators for reasons of performance and power efficiency that also need to adapt to new requirements. Reconfigurable logic allows the definition of new functions to be implemented in dynamically instantiated hardware units, combining adaptivity with hardware speed and efficiency. For the Intrusion Detection System example, new rules can be hardcoded into the reconfigurable logic, achieving high performance, while providing the necessary adaptivity to new threats. The FASTER (Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration) project aims at introducing a complete methodology to allow designers to easily implement a system specification on a platform combining a general purpose processor with multiple accelerators running on an FPGA, taking as input a high-level description and fully exploiting, both at design- and run-time, the capabilities of partial dynamic reconfiguration. The FASTER project will facilitate the use of reconfigurable hardware by providing a complete methodology that enables designers to easily implement and verify applications on platforms with general-purpose processors and acceleration modules implemented in the latest reconfigurable technology

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Ghent University Academic Bibliography

Design Guidelines for High-Performance SCM Hierarchies

Author: Bugnion Edouard
Daglis Alexandros
Falsafi Babak
Picorel Javier
Pnevmatikatos Dionisios
Sutherland Mark
Ustiugov Dmitrii
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/03/2019
Field of study

With emerging storage-class memory (SCM) nearing commercialization, there is evidence that it will deliver the much-anticipated high density and access latencies within only a few factors of DRAM. Nevertheless, the latency-sensitive nature of memory-resident services makes seamless integration of SCM in servers questionable. In this paper, we ask the question of how best to introduce SCM for such servers to improve overall performance/cost over existing DRAM-only architectures. We first show that even with the most optimistic latency projections for SCM, the higher memory access latency results in prohibitive performance degradation. However, we find that deployment of a modestly sized high-bandwidth 3D stacked DRAM cache makes the performance of an SCM-mostly memory system competitive. The high degree of spatial locality that memory-resident services exhibit not only simplifies the DRAM cache's design as page-based, but also enables the amortization of increased SCM access latencies and the mitigation of SCM's read/write latency disparity. We identify the set of memory hierarchy design parameters that plays a key role in the performance and cost of a memory system combining an SCM technology and a 3D stacked DRAM cache. We then introduce a methodology to drive provisioning for each of these design parameters under a target performance/cost goal. Finally, we use our methodology to derive concrete results for specific SCM technologies. With PCM as a case study, we show that a two bits/cell technology hits the performance/cost sweet spot, reducing the memory subsystem cost by 40% while keeping performance within 3% of the best performing DRAM-only system, whereas single-level and triple-level cell organizations are impractical for use as memory replacements.Comment: Published at MEMSYS'1

arXiv.org e-Print Archive

Crossref

Modeling the Scalability of the EuroExa Reconfigurable Accelerators - Preliminary Results

Author: Goumas Georgios
Miliadis Panagiotis
Mpakos Panagiotis
Papadopoulou Nikela
Pnevmatikatos Dionisios
Publication venue: Springer International Publishing
Publication date: 27/04/2022
Field of study

Current technology and application trends push for both performance and power efficiency. EuroEXA is a project that tries to achieve these goals and push its performance to exascale performance. Towards this objective, EuroEXA node integrate reconfigurable (FPGA) accelerators to offload computational intensive workloads. To fully utilize the FPGA’s resource pool, multiple accelerators must be instantiated. System design and dimensioning requires an early performance estimation to evaluate different design options, including using larger FPGA devices, instantiating larger number of accelerator instances, etc. In this paper, we present the preliminary results of modeling the scalability of EuroEXA reconfigurable accelerators in the FPGA fabric. We start by using simple equations to bound the total number of kernels that can work in parallel depending on the available memory channels and reconfigurable resources. Then, we use a 2nd degree polynomial model to predict the performance benefits of instantiating multiple replicated kernels in a FPGA. The model suggests whether the switching to another larger FPGA is advantageous choice in terms of performance. We verify our results using micro-benchmarks on two state-of-the-art FPGAs; AlveoU50 and AlveoU280

Enlighten

Harnessing Hardware Acceleration with RISC-V and the EU Processor

Author: Fumero Alfonso Juan
Goli Mehdi
Kotselidis Christos-Efthymios
Koziris Nectarios
Nikas Konstantinos
Pnevmatikatos Dionisios
Reyes Ruyman
Stratikopoulos Athanasios
Publication venue
Publication date: 06/06/2023
Field of study

The University of Manchester - Institutional Repository

Streamlining data cache access with fast address calculation

Author: Aho Alfred V.
Dionisios N. Pnevmatikatos
Golden Michael
Gurindar S. Sohi
John
Steven
Todd M. Austin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Smart technologies for effective reconfiguration: the FASTER approach

Author: Becker Tobias
Bonetto A
Cazzaniga A
Davidson Tom
Durelli GC
Gaydadjiev Georgi
Luk Wayne
Papadimitriou Kyprianos
Pilato Christiano
Pnevmatikatos Dionisios
Santambrogio Marco D
Sciuto Donatella
Stroobandt Dirk
Todman Tim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Current and future computing systems increasingly require that their functionality stays flexible after the system is operational, in order to cope with changing user requirements and improvements in system features, i.e. changing protocols and data-coding standards, evolving demands for support of different user applications, and newly emerging applications in communication, computing and consumer electronics. Therefore, extending the functionality and the lifetime of products requires the addition of new functionality to track and satisfy the customers needs and market and technology trends. Many contemporary products along with the software part incorporate hardware accelerators for reasons of performance and power efficiency. While adaptivity of software is straightforward, adaptation of the hardware to changing requirements constitutes a challenging problem requiring delicate solutions. The FASTER (Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration) project aims at introducing a complete methodology to allow designers to easily implement a system specification on a platform which includes a general purpose processor combined with multiple accelerators running on an FPGA, taking as input a high-level description and fully exploiting, both at design time and at run time, the capabilities of partial dynamic reconfiguration. The goal is that for selected application domains, the FASTER toolchain will be able to reduce the design and verification time of complex reconfigurable systems providing additional novel verification features that are not available in existing tool flows

Ghent University Academic Bibliography

The Mondrian Data Engine

Author: Daglis Alexandros
Drumond Mario
Falsafi Babak
Grot Boris
Mirzadeh Nooshin S.
Picorel Javier
Pnevmatikatos Dionisios N.
Ustiugov Dmitrii
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/05/2017
Field of study

The increasing demand for extracting value out of ever-growing data poses an ongoing challenge to system designers, a task only made trickier by the end of Dennard scaling. As the performance density of traditional CPU-centric architectures stagnates, advancing compute capabilities necessitates novel architectural approaches. Near-memory processing (NMP) architectures are reemerging as promising candidates to improve computing efficiency through tight coupling of logic and memory. NMP architectures are especially fitting for data analytics, as they provide immense bandwidth to memory-resident data and dramatically reduce data movement, the main source of energy consumption. Modern data analytics operators are optimized for CPU execution and hence rely on large caches and employ random memory accesses. In the context of NMP, such random accesses result in wasteful DRAM row buffer activations that account for a significant fraction of the total memory access energy. In addition, utilizing NMP’s ample bandwidth with fine-grained random accesses requires complex hardware that cannot be accommodated under NMP’s tight area and power constraints. Our thesis is that efficient NMP calls for an algorithm-hardware co-design that favors algorithms with sequential accesses to enable simple hardware that accesses memory in streams. We introduce an instance of such a co-designed NMP architecture for data analytics, the Mondrian Data Engine. Compared to a CPU-centric and a baseline NMP system, the Mondrian Data Engine improves the performance of basic data analytics operators by up to 49× and 5×, and efficiency by up to 28× and 5×, respectively

Infoscience - École polytechnique fédérale de Lausanne

Edinburgh Research Explorer

DeSyRe: on-Demand System Reliability

Author: Armato Antonino
Bouganis Christos-Savvas
Falsafi Babak
Gaydadjiev Georgi
Isaza Sebastian
Malek Alirad
Mariani Riccardo
Pnevmatikatos Dionisios N
Pradhan Dhiraj K
Rauwerda Gerard
Seepers Robert
Shafik Rishad Ahmed
Sourdis Ioannis
Strydis Christos
Sunesen Kim
Theodoropoulos Dimitris
Tzilis Stavros
Vavouras Michail
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

Southampton (e-Prints Soton)

Crossref

EUR Research Repository

Chalmers Research

Chalmers Publication Library

Explore Bristol Research