742 research outputs found
A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems
Recent technological advances have greatly improved the performance and
features of embedded systems. With the number of just mobile devices now
reaching nearly equal to the population of earth, embedded systems have truly
become ubiquitous. These trends, however, have also made the task of managing
their power consumption extremely challenging. In recent years, several
techniques have been proposed to address this issue. In this paper, we survey
the techniques for managing power consumption of embedded systems. We discuss
the need of power management and provide a classification of the techniques on
several important parameters to highlight their similarities and differences.
This paper is intended to help the researchers and application-developers in
gaining insights into the working of power management techniques and designing
even more efficient high-performance embedded systems of tomorrow
Configurable Strategies for Work-stealing
Work-stealing systems are typically oblivious to the nature of the tasks they
are scheduling. For instance, they do not know or take into account how long a
task will take to execute or how many subtasks it will spawn. Moreover, the
actual task execution order is typically determined by the underlying task
storage data structure, and cannot be changed. There are thus possibilities for
optimizing task parallel executions by providing information on specific tasks
and their preferred execution order to the scheduling system.
We introduce scheduling strategies to enable applications to dynamically
provide hints to the task-scheduling system on the nature of specific tasks.
Scheduling strategies can be used to independently control both local task
execution order as well as steal order. In contrast to conventional scheduling
policies that are normally global in scope, strategies allow the scheduler to
apply optimizations on individual tasks. This flexibility greatly improves
composability as it allows the scheduler to apply different, specific
scheduling choices for different parts of applications simultaneously. We
present a number of benchmarks that highlight diverse, beneficial effects that
can be achieved with scheduling strategies. Some benchmarks (branch-and-bound,
single-source shortest path) show that prioritization of tasks can reduce the
total amount of work compared to standard work-stealing execution order. For
other benchmarks (triangle strip generation) qualitatively better results can
be achieved in shorter time. Other optimizations, such as dynamic merging of
tasks or stealing of half the work, instead of half the tasks, are also shown
to improve performance. Composability is demonstrated by examples that combine
different strategies, both within the same kernel (prefix sum) as well as when
scheduling multiple kernels (prefix sum and unbalanced tree search)
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems
This dissertation addresses the complexities of 2D/3D multi-/many-core processor-memory systems, focusing on two key areas: enhancing timing predictability in real-time multi-core processors and optimizing performance within thermal constraints. The integration of an increasing number of transistors into compact chip designs, while boosting computational capacity, presents challenges in resource contention and thermal management. The first part of the thesis improves timing predictability. We enhance shared cache interference analysis for set-associative caches, advancing the calculation of Worst-Case Execution Time (WCET). This development enables accurate assessment of cache interference and the effectiveness of partitioned schedulers in real-world scenarios. We introduce TCPS, a novel task and cache-aware partitioned scheduler that optimizes cache partitioning based on task-specific WCET sensitivity, leading to improved schedulability and predictability. Our research explores various cache and scheduling configurations, providing insights into their performance trade-offs. The second part focuses on thermal management in 2D/3D many-core systems. Recognizing the limitations of Dynamic Voltage and Frequency Scaling (DVFS) in S-NUCA many-core processors, we propose synchronous thread migrations as a thermal management strategy. This approach culminates in the HotPotato scheduler, which balances performance and thermal safety. We also introduce 3D-TTP, a transient temperature-aware power budgeting strategy for 3D-stacked systems, reducing the need for Dynamic Thermal Management (DTM) activation. Finally, we present 3QUTM, a novel method for 3D-stacked systems that combines core DVFS and memory bank Low Power Modes with a learning algorithm, optimizing response times within thermal limits. This research contributes significantly to enhancing performance and thermal management in advanced processor-memory systems
A survey on scheduling and mapping techniques in 3D Network-on-chip
Network-on-Chips (NoCs) have been widely employed in the design of
multiprocessor system-on-chips (MPSoCs) as a scalable communication solution.
NoCs enable communications between on-chip Intellectual Property (IP) cores and
allow those cores to achieve higher performance by outsourcing their
communication tasks. Mapping and Scheduling methodologies are key elements in
assigning application tasks, allocating the tasks to the IPs, and organising
communication among them to achieve some specified objectives. The goal of this
paper is to present a detailed state-of-the-art of research in the field of
mapping and scheduling of applications on 3D NoC, classifying the works based
on several dimensions and giving some potential research directions
Analysis of Various Decentralized Load Balancing Techniques with Node Duplication
Experience in parallel computing is an increasingly necessary skill for today’s upcoming computer scientists as processors are hitting a serial execution performance barrier and turning to parallel execution for continued gains. The uniprocessor system has now reached its maximum speed limit and, there is very less scope to improve the speed of such type of system. To solve this problem multiprocessor system is used, which have more than one processor. Multiprocessor system improves the speed of the system but it again faces some problems like data dependency, control dependency, resource dependency and improper load balancing. So this paper presents a detailed analysis of various decentralized load balancing techniques with node duplication to reduce the proper execution time
The Thermal-Constrained Real-Time Systems Design on Multi-Core Platforms -- An Analytical Approach
Over the past decades, the shrinking transistor size enabled more transistors to be integrated into an IC chip, to achieve higher and higher computing performances. However, the semiconductor industry is now reaching a saturation point of Moore’s Law largely due to soaring power consumption and heat dissipation, among other factors. High chip temperature not only significantly increases packing/cooling cost, degrades system performance and reliability, but also increases the energy consumption and even damages the chip permanently. Although designing 2D and even 3D multi-core processors helps to lower the power/thermal barrier for single-core architectures by exploring the thread/process level parallelism, the higher power density and longer heat removal path has made the thermal problem substantially more challenging, surpassing the heat dissipation capability of traditional cooling mechanisms such as cooling fan, heat sink, heat spread, etc., in the design of new generations of computing systems. As a result, dynamic thermal management (DTM), i.e. to control the thermal behavior by dynamically varying computing performance and workload allocation on an IC chip, has been well-recognized as an effective strategy to deal with the thermal challenges.
Over the past decades, the shrinking transistor size, benefited from the advancement of IC technology, enabled more transistors to be integrated into an IC chip, to achieve higher and higher computing performances. However, the semiconductor industry is now reaching a saturation point of Moore’s Law largely due to soaring power consumption and heat dissipation, among other factors. High chip temperature not only significantly increases packing/cooling cost, degrades system performance and reliability, but also increases the energy consumption and even damages the chip permanently. Although designing 2D and even 3D multi-core processors helps to lower the power/thermal barrier for single-core architectures by exploring the thread/process level parallelism, the higher power density and longer heat removal path has made the thermal problem substantially more challenging, surpassing the heat dissipation capability of traditional cooling mechanisms such as cooling fan, heat sink, heat spread, etc., in the design of new generations of computing systems. As a result, dynamic thermal management (DTM), i.e. to control the thermal behavior by dynamically varying computing performance and workload allocation on an IC chip, has been well-recognized as an effective strategy to deal with the thermal challenges.
Different from many existing DTM heuristics that are based on simple intuitions, we seek to address the thermal problems through a rigorous analytical approach, to achieve the high predictability requirement in real-time system design. In this regard, we have made a number of important contributions. First, we develop a series of lemmas and theorems that are general enough to uncover the fundamental principles and characteristics with regard to the thermal model, peak temperature identification and peak temperature reduction, which are key to thermal-constrained real-time computer system design. Second, we develop a design-time frequency and voltage oscillating approach on multi-core platforms, which can greatly enhance the system throughput and its service capacity. Third, different from the traditional workload balancing approach, we develop a thermal-balancing approach that can substantially improve the energy efficiency and task partitioning feasibility, especially when the system utilization is high or with a tight temperature constraint. The significance of our research is that, not only can our proposed algorithms on throughput maximization and energy conservation outperform existing work significantly as demonstrated in our extensive experimental results, the theoretical results in our research are very general and can greatly benefit other thermal-related research
A Survey and Comparative Study of Hard and Soft Real-time Dynamic Resource Allocation Strategies for Multi/Many-core Systems
Multi-/many-core systems are envisioned to satisfy the ever-increasing performance requirements of complex applications in various domains such as embedded and high-performance computing. Such systems need to cater to increasingly dynamic workloads, requiring efficient dynamic resource allocation strategies to satisfy hard or soft real-time constraints. This article provides an extensive survey of hard and soft real-time dynamic resource allocation strategies proposed since the mid-1990s and highlights the emerging trends for multi-/many-core systems. The survey covers a taxonomy of the resource allocation strategies and considers their various optimization objectives, which have been used to provide comprehensive comparison. The strategies employ various principles, such as market and biological concepts, to perform the optimizations. The trend followed by the resource allocation strategies, open research challenges, and likely emerging research directions have also been provided
- …