10 research outputs found
STTAR: A Traffic- and Thermal-Aware Adaptive Routing for 3D Network-on-Chip Systems
Since the three-dimensional Network on Chip (3D NoC) uses through-silicon via technology to connect the chips, each silicon layer is conducted through heterogeneous thermal, and 3D NoC system suffers from thermal problems. To alleviate the seriousness of the thermal problem, the distribution of data packets usually relies on traffic information or historical temperature information. However, thermal problems in 3D NoC cannot be solved only based on traffic or temperature information. Therefore, we propose a Score-Based Traffic- and Thermal-Aware Adaptive Routing (STTAR) that applies traffic load and temperature information to routing. First, the STTAR dynamically adjusts the input and output buffer lengths of each router with traffic load information to limit routing resources in overheated areas and control the rate of temperature rise. Second, STTAR adopts a scoring strategy based on temperature and the number of free slots in the buffer to avoid data packets being transmitted to high-temperature areas and congested areas and to improve the rationality of selecting routing output nodes. In our experiments, the proposed scoring Score-Based Traffic- and Thermal-Aware Adaptive Routing (STTAR) scheme can increase the throughput by about 14.98% to 47.90% and reduce the delay by about 10.80% to 35.36% compared with the previous works
Embedded dynamic programming networks for networks-on-chip
PhD ThesisRelentless technology downscaling and recent technological advancements
in three dimensional integrated circuit (3D-IC) provide a promising
prospect to realize heterogeneous system-on-chip (SoC) and homogeneous
chip multiprocessor (CMP) based on the networks-onchip
(NoCs) paradigm with augmented scalability, modularity and
performance. In many cases in such systems, scheduling and managing
communication resources are the major design and implementation
challenges instead of the computing resources. Past research
efforts were mainly focused on complex design-time or simple heuristic
run-time approaches to deal with the on-chip network resource
management with only local or partial information about the network.
This could yield poor communication resource utilizations and amortize
the benefits of the emerging technologies and design methods.
Thus, the provision for efficient run-time resource management in
large-scale on-chip systems becomes critical. This thesis proposes a
design methodology for a novel run-time resource management infrastructure
that can be realized efficiently using a distributed architecture,
which closely couples with the distributed NoC infrastructure. The
proposed infrastructure exploits the global information and status
of the network to optimize and manage the on-chip communication
resources at run-time.
There are four major contributions in this thesis. First, it presents a
novel deadlock detection method that utilizes run-time transitive closure
(TC) computation to discover the existence of deadlock-equivalence
sets, which imply loops of requests in NoCs. This detection scheme,
TC-network, guarantees the discovery of all true-deadlocks without
false alarms in contrast to state-of-the-art approximation and heuristic
approaches. Second, it investigates the advantages of implementing
future on-chip systems using three dimensional (3D) integration and
presents the design, fabrication and testing results of a TC-network
implemented in a fully stacked three-layer 3D architecture using a
through-silicon via (TSV) complementary metal-oxide semiconductor
(CMOS) technology. Testing results demonstrate the effectiveness
of such a TC-network for deadlock detection with minimal computational
delay in a large-scale network. Third, it introduces an adaptive
strategy to effectively diffuse heat throughout the three dimensional
network-on-chip (3D-NoC) geometry. This strategy employs a dynamic
programming technique to select and optimize the direction of data
manoeuvre in NoC. It leads to a tool, which is based on the accurate
HotSpot thermal model and SystemC cycle accurate model, to simulate
the thermal system and evaluate the proposed approach. Fourth, it
presents a new dynamic programming-based run-time thermal management
(DPRTM) system, including reactive and proactive schemes, to
effectively diffuse heat throughout NoC-based CMPs by routing packets
through the coolest paths, when the temperature does not exceed
chip’s thermal limit. When the thermal limit is exceeded, throttling is
employed to mitigate heat in the chip and DPRTM changes its course
to avoid throttled paths and to minimize the impact of throttling on
chip performance.
This thesis enables a new avenue to explore a novel run-time resource
management infrastructure for NoCs, in which new methodologies
and concepts are proposed to enhance the on-chip networks for
future large-scale 3D integration.Iraqi Ministry of Higher Education and Scientific Research (MOHESR)
Recommended from our members
On thermal sensor calibration and software techniques for many-core thermal management
The high power density of a many-core processor results in increased temperature which negatively impacts system reliability and performance. Dynamic thermal management applies thermal-aware techniques at run time to avoid overheating using temperature information collected from on-chip thermal sensors. Temperature sensing and thermal control schemes are two critical technologies for successfully maintaining thermal safety. In this dissertation, on-line thermal sensor calibration schemes are developed to provide accurate temperature information.
Software-based dynamic thermal management techniques are proposed using calibrated thermal sensors. Due to process variation and silicon aging, on-chip thermal sensors require periodic calibration before use in DTM. However, the calibration cost for thermal sensors can be prohibitively high as the number of on-chip sensors increases. Linear models which are suitable for on-line calculation are employed to estimate temperatures at multiple sensor locations using performance counters. The estimated temperature and the actual sensor thermal profile show a very high similarity with correlation coefficient ~0.9 for SPLASH2 and SPEC2000 benchmarks.
A calibration approach is proposed to combine potentially inaccurate temperature values obtained from two sources: thermal sensor readings and temperature estimations. A data fusion strategy based on Bayesian inference, which combines information from these two sources, is demonstrated. The result shows the strategy can effectively recalibrate sensor readings in response to inaccuracies caused by process variation and environmental noise. The average absolute error of the corrected sensor temperature readings is
A dynamic task allocation strategy is proposed to address localized overheating in many-core systems. Our approach employs reinforcement learning, a dynamic machine learning algorithm that performs task allocation based on current temperatures and a prediction regarding which assignment will minimize the peak temperature. Our results show that the proposed technique is fast (scheduling performed in \u3c1 \u3ems) and can efficiently reduce peak temperature by up to 8 degree C in a 49-core processor (6% on average) versus a leading competing task allocation approach for a series of SPLASH-2 benchmarks. Reinforcement learning has also been applied to 3D integrated circuits to allocate tasks with thermal awareness
Machine Learning for Resource-Constrained Computing Systems
Die verfügbaren Ressourcen in Informationsverarbeitungssystemen wie Prozessoren sind in der Regel eingeschränkt.
Das umfasst z. B. die elektrische Leistungsaufnahme, den Energieverbrauch, die Wärmeabgabe oder die Chipfläche.
Daher ist die Optimierung der Verwaltung der verfügbaren Ressourcen von größter Bedeutung, um Ziele wie maximale Performanz zu erreichen.
Insbesondere die Ressourcenverwaltung auf der Systemebene hat über die (dynamische) Zuweisung von Anwendungen zu Prozessorkernen und über die Skalierung der Spannung und Frequenz (dynamic voltage and frequency scaling, DVFS) einen großen Einfluss auf die Performanz, die elektrische Leistung und die Temperatur während der Ausführung von Anwendungen.
Die wichtigsten Herausforderungen bei der Ressourcenverwaltung sind die hohe Komplexität von Anwendungen und Plattformen, unvorhergesehene (zur Entwurfszeit nicht bekannte) Anwendungen oder Plattformkonfigurationen, proaktive Optimierung und die Minimierung des Laufzeit-Overheads.
Bestehende Techniken, die auf einfachen Heuristiken oder analytischen Modellen basieren, gehen diese Herausforderungen nur unzureichend an.
Aus diesem Grund ist der Hauptbeitrag dieser Dissertation der Einsatz maschinellen Lernens (ML) fĂĽr Ressourcenverwaltung.
ML-basierte Lösungen ermöglichen die Bewältigung dieser Herausforderungen durch die Vorhersage der Auswirkungen potenzieller Entscheidungen in der Ressourcenverwaltung, durch Schätzung verborgener (unbeobachtbarer) Eigenschaften von Anwendungen oder durch direktes Lernen einer Ressourcenverwaltungs-Strategie.
Diese Dissertation entwickelt mehrere neuartige ML-basierte Ressourcenverwaltung-Techniken fĂĽr verschiedene Plattformen, Ziele und Randbedingungen.
Zunächst wird eine auf Vorhersagen basierende Technik zur Maximierung der Performanz von Mehrkernprozessoren mit verteiltem Last-Level Cache und limitierter Maximaltemperatur vorgestellt.
Diese verwendet ein neuronales Netzwerk (NN) zur Vorhersage der Auswirkungen potenzieller Migrationen von Anwendungen zwischen Prozessorkernen auf die Performanz.
Diese Vorhersagen erlauben die Bestimmung der bestmöglichen Migration und ermöglichen eine proaktive Verwaltung.
Das NN ist so trainiert, dass es mit unbekannten Anwendungen und verschiedenen Temperaturlimits zurechtkommt.
Zweitens wird ein Boosting-Verfahren zur Maximierung der Performanz homogener Mehrkernprozessoren mit limitierter Maximaltemperatur mithilfe von DVFS vorgestellt.
Dieses basiert auf einer neuartigen {Boostability}-Metrik, die die Abhängigkeiten von Performanz, elektrischer Leistung und Temperatur auf Spannungs/Frequenz-Änderungen in einer Metrik vereint. % ignorerepeated
Die Abhängigkeiten von Performanz und elektrischer Leistung hängen von der Anwendung ab und können zur Laufzeit nicht direkt beobachtet (gemessen) werden.
Daher wird ein NN verwendet, um diese Werte für unbekannte Anwendungen zu schätzen und so die Komplexität der Boosting-Optimierung zu bewältigen.
Drittens wird eine Technik zur Temperaturminimierung von heterogenen Mehrkernprozessoren mit Quality of Service-Zielen vorgestellt.
Diese verwendet Imitationslernen, um eine Migrationsstrategie von Anwendungen aus optimalen Orakel-Demonstrationen zu lernen.
Dafür wird ein NN eingesetzt, um die Komplexität der Plattform und des Anwendungsverhaltens zu bewältigen.
Die Inferenz des NNs wird mit Hilfe eines vorhandenen generischen Beschleunigers, einer Neural Processing Unit (NPU), beschleunigt.
Auch die ML Algorithmen selbst mĂĽssen auch mit begrenzten Ressourcen ausgefĂĽhrt werden.
Zuletzt wird eine Technik für ressourcenorientiertes Training auf verteilten Geräten vorgestellt, um einen konstanten Trainingsdurchsatz bei sich schnell ändernder Verfügbarkeit von Rechenressourcen aufrechtzuerhalten, wie es z.~B.~aufgrund von Konflikten bei gemeinsam genutzten Ressourcen der Fall ist.
Diese Technik verwendet Structured Dropout, welches beim Training zufällige Teile des NNs auslässt.
Dadurch können die erforderlichen Ressourcen für das Training dynamisch angepasst werden -- mit vernachlässigbarem Overhead, aber auf Kosten einer langsameren Trainingskonvergenz.
Die Pareto-optimalen Dropout-Parameter pro Schicht des NNs werden durch eine Design Space Exploration bestimmt.
Evaluierungen dieser Techniken werden sowohl in Simulationen als auch auf realer Hardware durchgeführt und zeigen signifikante Verbesserungen gegenüber dem Stand der Technik, bei vernachlässigbarem Laufzeit-Overhead.
Zusammenfassend zeigt diese Dissertation, dass ML eine SchlĂĽsseltechnologie zur Optimierung der Verwaltung der limitierten Ressourcen auf Systemebene ist, indem die damit verbundenen Herausforderungen angegangen werden
Adaptive Knobs for Resource Efficient Computing
Performance demands of emerging domains such as artificial intelligence, machine learning and vision, Internet-of-things etc., continue to grow. Meeting such requirements on modern multi/many core systems with higher power densities, fixed power and energy budgets, and thermal constraints exacerbates the run-time management challenge. This leaves an open problem on extracting the required performance within the power and energy limits, while also ensuring thermal safety. Existing architectural solutions including asymmetric and heterogeneous cores and custom acceleration improve performance-per-watt in specific design time and static scenarios. However, satisfying applications’ performance requirements under dynamic and unknown workload scenarios subject to varying system dynamics of power, temperature and energy requires intelligent run-time management.
Adaptive strategies are necessary for maximizing resource efficiency, considering i) diverse requirements and characteristics of concurrent applications, ii) dynamic workload variation, iii) core-level heterogeneity and iv) power, thermal and energy constraints. This dissertation proposes such adaptive techniques for efficient run-time resource management to maximize performance within fixed budgets under unknown and dynamic workload scenarios. Resource management strategies proposed in this dissertation comprehensively consider application and workload characteristics and variable effect of power actuation on performance for pro-active and appropriate allocation decisions. Specific contributions include i) run-time mapping approach to improve power budgets for higher throughput, ii) thermal aware performance boosting for efficient utilization of power budget and higher performance, iii) approximation as a run-time knob exploiting accuracy performance trade-offs for maximizing performance under power caps at minimal loss of accuracy and iv) co-ordinated approximation for heterogeneous systems
through joint actuation of dynamic approximation and power knobs for performance guarantees with minimal power consumption.
The approaches presented in this dissertation focus on adapting existing mapping techniques, performance boosting strategies, software and dynamic approximations to meet the performance requirements, simultaneously considering system constraints. The proposed strategies are compared against relevant state-of-the-art run-time management frameworks to qualitatively evaluate their efficacy
Towards Computational Efficiency of Next Generation Multimedia Systems
To address throughput demands of complex applications (like Multimedia), a next-generation system designer needs to co-design and co-optimize the hardware and software layers. Hardware/software knobs must be tuned in synergy to increase the throughput efficiency. This thesis provides such algorithmic and architectural solutions, while considering the new technology challenges (power-cap and memory aging). The goal is to maximize the throughput efficiency, under timing- and hardware-constraints
Vehicle and Traffic Safety
The book is devoted to contemporary issues regarding the safety of motor vehicles and road traffic. It presents the achievements of scientists, specialists, and industry representatives in the following selected areas of road transport safety and automotive engineering: active and passive vehicle safety, vehicle dynamics and stability, testing of vehicles (and their assemblies), including electric cars as well as autonomous vehicles. Selected issues from the area of accident analysis and reconstruction are discussed. The impact on road safety of aspects such as traffic control systems, road infrastructure, and human factors is also considered