Search CORE

32 research outputs found

CROSS-LAYER CUSTOMIZATION PLATFORM FOR LOW-POWER AND REAL-TIME EMBEDDED APPLICATIONS

Author: Zhou Xiangrong
Publication venue
Publication date: 01/07/2008
Field of study

Modern embedded applications have become increasingly complex and diverse in their functionalities and requirements. Data processing, communication and multimedia signal processing, real-time control and various other functionalities can often need to be implemented on the same System-on-Chip(SOC) platform. The significant power constraints and real-time guarantee requirements of these applications have become significant obstacles for the traditional embedded system design methodologies. The general-purpose computing microarchitectures of these platforms are designed to achieve good performance on average, which is far from optimal for any particular application. The system must always assume worst-case scenarios, which results in significant power inefficiencies and resource under-utilization. This dissertation introduces a cross-layer application-customizable embedded platform, which dynamically exploits application information and fine-tunes system components at system software and hardware layers. This is achieved with the close cooperation and seamless integration of the compiler, the operating system, and the hardware architecture. The compiler is responsible for extracting application regularities through static and profile-based analysis. The relevant application knowledge is propagated and utilized at run-time across the system layers through the judiciously introduced reconfigurability at both OS and hardware layers. The introduced framework comprehensively covers the fundamental subsystems of memory management and multi-tasking execution control

Digital Repository at the University of Maryland

Associative Instruction Reordering to Alleviate Register Pressure

Author: Pouchet Louis-Noël
Rastello Fabrice
Rountev Atanas
Sadayappan Ponnuswamy
Singh Rawat Prashant
Sukumaran-Rajam Aravind
Publication venue: HAL CCSD
Publication date: 11/11/2018
Field of study

International audienceRegister allocation is generally considered a practically solved problem. For most applications, the register allocation strategies in production compilers are very effective in controlling the number of loads/stores and register spills. However, existing register allocation strategies are not effective and result in excessive register spilling for computation patterns with a high degree of many-to-many data reuse, e.g., high-order stencils and tensor contractions. We develop a source-to-source instruction reordering strategy that exploits the flexibility of reordering associative operations to alleviate register pressure. The developed transformation module implements an adaptable strategy that can appropriately control the degree of instruction-level parallelism, while relieving register pressure. The effectiveness of the approach is demonstrated through experimental results using multiple production compilers (GCC, Clang/LLVM) and target platforms (Intel Xeon Phi, and Intel x86 multi-core)

INRIA a CCSD electronic archive server

High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities

Author: Arimilli Baba
Barroso Luiz André
Clos C
Dally W. J.
Dennis Abts
John Kim
Leiserson Charles E.
Scott S.
Singh Arjun
Sterling Thomas L.
Publication venue: 'Morgan & Claypool Publishers LLC'
Publication date
Field of study

Crossref

Dependable Embedded Systems

Author: Dutt Nikil
Henkel Jörg
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

OAPEN Library

EClass: An execution classification approach to improving the energy-efficiency of software via machine learning

Author: Chan WK
Kan EYY
Tse TH
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Energy efficiency at the software level has gained much attention in the past decade. This paper presents a performance-aware frequency assignment algorithm for reducing processor energy consumption using Dynamic Voltage and Frequency Scaling (DVFS). Existing energy-saving techniques often rely on simplified predictions or domain knowledge to extract energy savings for specialized software (such as multimedia or mobile applications) or hardware (such as NPU or sensor nodes). We present an innovative framework, known as EClass, for general-purpose DVFS processors by recognizing short and repetitive utilization patterns efficiently using machine learning. Our algorithm is lightweight and can save up to 52.9% of the energy consumption compared with the classical PAST algorithm. It achieves an average savings of 9.1% when compared with an existing online learning algorithm that also utilizes the statistics from the current execution only. We have simulated the algorithms on a cycle-accurate power simulator. Experimental results show that EClass can effectively save energy for real life applications that exhibit mixed CPU utilization patterns during executions. Our research challenges an assumption among previous work in the research community that a simple and efficient heuristic should be used to adjust the processor frequency online. Our empirical result shows that the use of an advanced algorithm such as machine learning can not only compensate for the energy needed to run such an algorithm, but also outperforms prior techniques based on the above assumption. © 2011 Elsevier Inc. All rights reserved.postprin

HKU Scholars Hub

SAFA: Stack and frame architecture

Author: SOO YUEN JIEN
Publication venue
Publication date: 19/05/2006
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Recommended from our members

The dynamic simultaneous multithreaded processor

Author: Ortiz-Arroyo Daniel
Publication venue: 'Oregon State University'
Publication date
Field of study

This dissertation investigates diverse techniques to support multithreading in modern high performance processors. The mechanisms studied expand the architecture of a high performance superscalar processor to control efficiently the interaction between software-controlled and hardware-controlled multithreading. Additionally, dynamic speculative mechanisms are proposed to exploit thread-level-parallelism (TLP) and instruction-level-parallelism (ILP) on a Simultaneous Multithreading (SMT) architecture. First, the hybrid multithreaded execution model is discussed. This model combines software-controlled multithreading with hardware support for efficient context switching and thread scheduling. A thread scheduling technique called set scheduling is introduced and its impact on the overall performance is described. An analytical model of the hybrid multithreaded execution is developed and validated by simulation. Through stochastic simulation, we find that the application of the hybrid multithreaded execution model results in higher processor utilization than traditional software-controlled multithreading. Next, in the main part of this dissertation, a new architecture is proposed: the Dynamic Simultaneous Multithreading (DSMT) processor. In this architecture, multiple threads are identified and created speculatively at runtime without compiler help. Subsequently, a SMT processor core executes those threads. The performance of a DSMT processor was evaluated with a new execution-driven simulator developed specifically for the purpose. Our experimental results based on simulation show that DSMT architecture has very good potential to improve SMT processor's performance when there is only a single task available for execution

ScholarsArchive@OSU

Optimization Techniques for Parallel Programming of Embedded Many-Core Computing Platforms

Author: Tagliavini Giuseppe <1980>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 08/05/2017
Field of study

Nowadays many-core computing platforms are widely adopted as a viable solution to accelerate compute-intensive workloads at different scales, from low-cost devices to HPC nodes. It is well established that heterogeneous platforms including a general-purpose host processor and a parallel programmable accelerator have the potential to dramatically increase the peak performance/Watt of computing architectures. However the adoption of these platforms further complicates application development, whereas it is widely acknowledged that software development is a critical activity for the platform design. The introduction of parallel architectures raises the need for programming paradigms capable of effectively leveraging an increasing number of processors, from two to thousands. In this scenario the study of optimization techniques to program parallel accelerators is paramount for two main objectives: first, improving performance and energy efficiency of the platform, which are key metrics for both embedded and HPC systems; second, enforcing software engineering practices with the aim to guarantee code quality and reduce software costs. This thesis presents a set of techniques that have been studied and designed to achieve these objectives overcoming the current state-of-the-art. As a first contribution, we discuss the use of OpenMP tasking as a general-purpose programming model to support the execution of diverse workloads, and we introduce a set of runtime-level techniques to support fine-grain tasks on high-end many-core accelerators (devices with a power consumption greater than 10W). Then we focus our attention on embedded computer vision (CV), with the aim to show how to achieve best performance by exploiting the characteristics of a specific application domain. To further reduce the power consumption of parallel accelerators beyond the current technological limits, we describe an approach based on the principles of approximate computing, which implies modification to the program semantics and proper hardware support at the architectural level

AMS Tesi di Dottorato

PROGRESS white papers 2006:embedded systems design, networks and connected systems, verification and validation, networks on chip

Author: Corporaal H.
Niemegeers I.G.M.M.
Vaandrager F.W.
Publication venue: STW Technology Foundation
Publication date: 01/01/2006
Field of study

Repository TU/e

Pure OAI Repository

Adaptive Knobs for Resource Efficient Computing

Author: Kanduri Anil
Publication venue: fi=Turku Centre for Computer Science|en=Turku Centre for Computer Science|
Publication date: 13/12/2018
Field of study

Performance demands of emerging domains such as artificial intelligence, machine learning and vision, Internet-of-things etc., continue to grow. Meeting such requirements on modern multi/many core systems with higher power densities, fixed power and energy budgets, and thermal constraints exacerbates the run-time management challenge. This leaves an open problem on extracting the required performance within the power and energy limits, while also ensuring thermal safety. Existing architectural solutions including asymmetric and heterogeneous cores and custom acceleration improve performance-per-watt in specific design time and static scenarios. However, satisfying applications’ performance requirements under dynamic and unknown workload scenarios subject to varying system dynamics of power, temperature and energy requires intelligent run-time management. Adaptive strategies are necessary for maximizing resource efficiency, considering i) diverse requirements and characteristics of concurrent applications, ii) dynamic workload variation, iii) core-level heterogeneity and iv) power, thermal and energy constraints. This dissertation proposes such adaptive techniques for efficient run-time resource management to maximize performance within fixed budgets under unknown and dynamic workload scenarios. Resource management strategies proposed in this dissertation comprehensively consider application and workload characteristics and variable effect of power actuation on performance for pro-active and appropriate allocation decisions. Specific contributions include i) run-time mapping approach to improve power budgets for higher throughput, ii) thermal aware performance boosting for efficient utilization of power budget and higher performance, iii) approximation as a run-time knob exploiting accuracy performance trade-offs for maximizing performance under power caps at minimal loss of accuracy and iv) co-ordinated approximation for heterogeneous systems through joint actuation of dynamic approximation and power knobs for performance guarantees with minimal power consumption. The approaches presented in this dissertation focus on adapting existing mapping techniques, performance boosting strategies, software and dynamic approximations to meet the performance requirements, simultaneously considering system constraints. The proposed strategies are compared against relevant state-of-the-art run-time management frameworks to qualitatively evaluate their efficacy

UTUPub