Search CORE

56 research outputs found

Architecting Selective Refresh based Multi-Retention Cache for Heterogeneous System (ARMOUR)

Author: Agarwal Sukarn
Chakraborty Shounak
Själander Magnus
Publication venue
Publication date: 15/09/2023
Field of study

RVSDG: An Intermediate Representation for Optimizing Compilers

Author: Bahmann Helge
Meyer Jan Christian
Reissmann Nico
Själander Magnus
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Intermediate Representations (IRs) are central to optimizing compilers as the way the program is represented may enhance or limit analyses and transformations. Suitable IRs focus on exposing the most relevant information and establish invariants that different compiler passes can rely on. While control-flow centric IRs appear to be a natural fit for imperative programming languages, analyses required by compilers have increasingly shifted to understand data dependencies and work at multiple abstraction layers at the same time. This is partially evidenced in recent developments such as the MLIR proposed by Google. However, rigorous use of data flow centric IRs in general purpose compilers has not been evaluated for feasibility and usability as previous works provide no practical implementations. We present the Regionalized Value State Dependence Graph (RVSDG) IR for optimizing compilers. The RVSDG is a data flow centric IR where nodes represent computations, edges represent computational dependencies, and regions capture the hierarchical structure of programs. It represents programs in demand-dependence form, implicitly supports structured control flow, and models entire programs within a single IR. We provide a complete specification of the RVSDG, construction and destruction methods, as well as exemplify its utility by presenting Dead Node and Common Node Elimination optimizations. We implemented a prototype compiler and evaluate it in terms of performance, code size, compilation time, and representational overhead. Our results indicate that the RVSDG can serve as a competitive IR in optimizing compilers while reducing complexity

arXiv.org e-Print Archive

NORA - Norwegian Open Research Archives

DELICIOUS: Deadline-Aware Approximate Computing in Cache-Conscious Multicore

Author: Agarwal Sukarn
Chakraborty Shounak
Gangopadhyay Rahul
McDonald-Maier Klaus
Saha Sangeet
Själander Magnus
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/12/2022
Field of study

Edinburgh Research Explorer

Huddersfield Research Portal

MAFin: Maximizing Accuracy in FinFET based Approximated Real-Time Computing

Author: Magnus Själander
McDonald-Maier Klaus
Saha Sangeet
Shounak Chakraborty
Publication venue: Association for Computing Machinery (ACM)
Publication date: 01/03/2024
Field of study

We propose MAFin that exploits the unique temperature effect inversion (TEI) property of a FinFET based multicore platform, where processing speed increases with temperature, in the context of approximate real-time computing. In approximate real-time computing platforms, the execution of each task can be divided into two parts: (i) the mandatory part, execution of which provides a result of acceptable quality, followed by (ii) the optional part, that can be executed partially or fully to refine the initially obtained result in order to increase the result-accuracy (QoS) without violating deadlines. With an objective to maximize the QoS for a FinFET based multicore system, MAFin, our proposed real-time scheduler first derives a task-to-core allocation, while respecting system-wide constraints and prepares a schedule. During execution, MAFin further increases the achieved QoS, while balancing the performance and temperature on-the-fly by incorporating a prudential temperature cognizant frequency management mechanism and guarantees imposed constraints. Specifically, MAFin exploits the TEI property of FinFET based processors, where processor-speed is enhanced at the increased temperature, to reduce the execution time of the individual tasks. This reduced execution-time is then traded off either to enhance QoS by executing more from the tasks’ optional parts or to improve energy efficiency by turning off the core. While surpassing prior art, MAFin achieves 70% QoS, which is further enhanced by 8.3% in online, with a maximum EDP gain of up to 12%, based on benchmark based evaluation on a 4-core based system

University of Essex Research Repository

Adapt or Become Extinct!:The Case for a Unified Framework for Deployment-Time Optimization

Author: Goumas Georgios
Gross Thomas R.
Karlsson Sven
McKee Sally A.
Probst Christian W.
Själander Magnus
Zhang Lixin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

The High-Performance Computing ecosystem consists of a large variety of execution platforms that demonstrate a wide diversity in hardware characteristics such as CPU architecture, memory organization, interconnection network, accelerators, etc. This environment also presents a number of hard boundaries (walls) for applications which limit software development (parallel programming wall), performance (memory wall, communication wall) and viability (power wall). The only way to survive in such a demanding environment is by adaptation. In this paper we discuss how dynamic information collected during the execution of an application can be utilized to adapt the execution context and may lead to performance gains beyond those provided by static information and compile-time adaptation. We consider specialization based on dynamic information like user input, architectural characteristics such as the memory hierarchy organization, and the execution profile of the application as obtained from the execution platform\u27s performance monitoring units. One of the challenges of future execution platforms is to allow the seamless integration of these various kinds of information with information obtained from static analysis (either during ahead-of-time or just-in-time) compilation. We extend the notion of information-driven adaptation and outline the architecture of an infrastructure designed to enable information flow and adaptation through-out the life-cycle of an application

Crossref

Chalmers Research

DSpace@NTUA (National Technical Univ. of Athens)

Online Research Database In Technology

Viterbi Accelerator for Embedded Processor Datapaths

Author: Ansari Kashan Khurshid
Azhar Muhammad Waqar
Hasan Ali
Hoang-Thanh Tung
Larsson-Edefors Per
Själander Magnus
Vijayashekar Akshay
Publication venue
Publication date: 01/01/2012
Field of study

We present a novel architecture for a lightweight Viterbi accelerator that can be tightly integrated inside an embedded processor. We investigate the accelerator’s impact on processor performance by using the EEMBC Viterbi benchmark and the in-house Viterbi Branch Metric kernel. Our evaluation based on the EEMBC benchmark shows that an accelerated 65-nm 2.7-ns processor datapath is 20% larger but 90% more cycle efficient than a datapath lacking the Viterbi accelerator, leading to an 87% overall energy reduction and a data throughput of 3.52 Mbit/s

Chalmers Research

Chalmers Publication Library

Improving processor efficiency by statically pipelining instructions

Author: Brandon Davis
David Whalley
Gang-Ryung Uh
Gary Tyson
Hoang-Thanh T.
Ian Finlayson
Magnus Själander
Peter Gavin
Sethi A.
Thoziyoor S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Efficient and Flexible Embedded Systems and Datapath Components

Author: Själander Magnus
Publication venue: Chalmers University of Technology, Göteborg
Publication date: 01/01/2008
Field of study

The comfort of our daily lives has come to rely on a vast number of embedded systems, such as mobile phones, anti-spin systems for cars, and high-definition video. To improve the end-user experience at often stringent require- ments, in terms of high performance, low power dissipation, and low cost, makes these systems complex and nontrivial to design. This thesis addresses design challenges in three different areas of embedded systems. The presented FlexCore processor intends to improve the programmability of heterogeneous embedded systems while maintaining the performance of application-specific accelerators. This is achieved by integrating accelerators into the datapath of a general-purpose processor in combination with a wide control word consisting of all control signals in a FlexCore’s datapath. Furthermore, a FlexCore processor utilizes a flexible interconnect, which together with the expressiveness of the wide control word improves its performance. When designing new embedded systems it is important to have efficient components to build from. Arithmetic circuits are especially important, since they are extensively used in all applications. In particular, integer multipliers present big design challenges. The proposed twin-precision technique makes it possible to improve both throughput and power of conventional integer multipliers, when computing narrow-width multiplications. The thesis also shows that the Baugh-Wooley algorithm is more suitable for hardware implementations of signed integer multipliers than the commonly used modified-Booth algorithm. A multi-core architecture is a common design choice when a single-core architecture cannot deliver sufficient performance. However, multi-core architectures introduce their own design challenges, such as scheduling applications onto several cores. This thesis presents a novel task management unit, which offloads task scheduling from the conventional cores of a multi-core system, thus improving both performance and power efficiency of the system. This thesis proposes novel solutions to a number of relevant issues that need to be addressed when designing embedded systems

Chalmers Research

Chalmers Publication Library