1,597 research outputs found
A Survey on IEEE 1588 Implementation for RISC-V Low-Power Embedded Devices
IEEE 1588, also known as the Precision Time Protocol (PTP), is a standard protocol for clock synchronization in distributed systems. While it is not architecture-specific, implementing IEEE 1588 on Reduced Instruction Set Computer-V (RISC-V) low-power embedded devices demands considering the system requirements and available resources. This paper explores various approaches and techniques to achieve accurate time synchronization in such instruments. The analysis covers software and hardware implementations, discussing each method’s challenges, benefits, and trade-offs. By examining the state-of-the-art in this field, this paper provides valuable insights and guidance for researchers and engineers working on time-critical applications in RISC-V-based embedded systems, aiding in selecting the most-suitable stack for their designs.This work was partially supported by the ECSEL Joint Undertaking in the H2020 project IMOCO4.E, grant agreement No.10100731, and by the Basque Government within the fund for research groups of the Basque University System IT1440-22 and KK-2023/00015
Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems
This dissertation addresses the complexities of 2D/3D multi-/many-core processor-memory systems, focusing on two key areas: enhancing timing predictability in real-time multi-core processors and optimizing performance within thermal constraints. The integration of an increasing number of transistors into compact chip designs, while boosting computational capacity, presents challenges in resource contention and thermal management. The first part of the thesis improves timing predictability. We enhance shared cache interference analysis for set-associative caches, advancing the calculation of Worst-Case Execution Time (WCET). This development enables accurate assessment of cache interference and the effectiveness of partitioned schedulers in real-world scenarios. We introduce TCPS, a novel task and cache-aware partitioned scheduler that optimizes cache partitioning based on task-specific WCET sensitivity, leading to improved schedulability and predictability. Our research explores various cache and scheduling configurations, providing insights into their performance trade-offs. The second part focuses on thermal management in 2D/3D many-core systems. Recognizing the limitations of Dynamic Voltage and Frequency Scaling (DVFS) in S-NUCA many-core processors, we propose synchronous thread migrations as a thermal management strategy. This approach culminates in the HotPotato scheduler, which balances performance and thermal safety. We also introduce 3D-TTP, a transient temperature-aware power budgeting strategy for 3D-stacked systems, reducing the need for Dynamic Thermal Management (DTM) activation. Finally, we present 3QUTM, a novel method for 3D-stacked systems that combines core DVFS and memory bank Low Power Modes with a learning algorithm, optimizing response times within thermal limits. This research contributes significantly to enhancing performance and thermal management in advanced processor-memory systems
ACiS: smart switches with application-level acceleration
Network performance has contributed fundamentally to the growth of supercomputing over the past decades. In parallel, High Performance Computing (HPC) peak performance has depended, first, on ever faster/denser CPUs, and then, just on increasing density alone. As operating frequency, and now feature size, have levelled off, two new approaches are becoming central to achieving higher net performance: configurability and integration. Configurability enables hardware to map to the application, as well as vice versa. Integration enables system components that have generally been single function-e.g., a network to transport data—to have additional functionality, e.g., also to operate on that data. More generally, integration enables compute-everywhere: not just in CPU and accelerator, but also in network and, more specifically, the communication switches.
In this thesis, we propose four novel methods of enhancing HPC performance through Advanced Computing in the Switch (ACiS). More specifically, we propose various flexible and application-aware accelerators that can be embedded into or attached to existing communication switches to improve the performance and scalability of HPC and Machine Learning (ML) applications. We follow a modular design discipline through introducing composable plugins to successively add ACiS capabilities.
In the first work, we propose an inline accelerator to communication switches for user-definable collective operations. MPI collective operations can often be performance killers in HPC applications; we seek to solve this bottleneck by offloading them to reconfigurable hardware within the switch itself. We also introduce a novel mechanism that enables the hardware to support MPI communicators of arbitrary shape and that is scalable to very large systems.
In the second work, we propose a look-aside accelerator for communication switches that is capable of processing packets at line-rate. Functions requiring loops and states are addressed in this method. The proposed in-switch accelerator is based on a RISC-V compatible Coarse Grained Reconfigurable Arrays (CGRAs).
To facilitate usability, we have developed a framework to compile user-provided C/C++ codes to appropriate back-end instructions for configuring the accelerator.
In the third work, we extend ACiS to support fused collectives and the combining of collectives with map operations. We observe that there is an opportunity of fusing communication (collectives) with computation. Since the computation can vary for different applications, ACiS support should be programmable in this method.
In the fourth work, we propose that switches with ACiS support can control and manage the execution of applications, i.e., that the switch be an active device with decision-making capabilities. Switches have a central view of the network; they can collect telemetry information and monitor application behavior and then use this information for control, decision-making, and coordination of nodes.
We evaluate the feasibility of ACiS through extensive RTL-based simulation as well as deployment in an open-access cloud infrastructure. Using this simulation framework, when considering a Graph Convolutional Network (GCN) application as a case study, a speedup of on average 3.4x across five real-world datasets is achieved on 24 nodes compared to a CPU cluster without ACiS capabilities
Integration of energy and urban planning dynamics for cities' climate-neutrality
145 p.La sociedad se enfrenta a la emergencia climática, donde las ciudades juegan un papel crucial como concentradoras de población, productoras de PIB, consumidoras de energía, generadoras de residuos y emisoras de GEI. Con la premisa de mejorar una respuesta eficaz de las administraciones locales a esta crisis, este doctorado se centra en dar soporte a aquellas ciudades que están dispuestas a iniciar su viaje hacia la neutralidad climática (cero emisiones netas de CO2) y en cómo orquestar esta transición desde un enfoque local. En términos de planificación del proceso hacia una meta tan ambiciosa, esta investigación ha identificado la falta de integración entre las dinámicas de planificación energética y urbana como una barrera clave en esta transición. En consecuencia, el objetivo de esta investigación es mejorar ese nivel de integración por diferentes medios. Primero, a través de un liderazgo integrado de procesos estratégicos municipales; en segundo lugar, mediante el uso de herramientas y procedimientos de gestión de datos adecuados que permitan la integración de elementos energéticos y espaciales en la toma de decisiones; y finalmente, a través de una involucración adecuada de los agentes locales en el proceso de planificación hacia la neutralidad climática. Todas estas hipótesis son analizadas por 4 estudios de investigación conectados entre sí, y validados posteriormente a través de su aplicación en el proceso de coordinación del plan de neutralidad climática de Vitoria-Gasteiz
Anpassen verteilter eingebetteter Anwendungen im laufenden Betrieb
The availability of third-party apps is among the key success factors for software ecosystems: The users benefit from more features and innovation speed, while third-party solution vendors can leverage the platform to create successful offerings.
However, this requires a certain decoupling of engineering activities of the different parties not achieved for distributed control systems, yet.
While late and dynamic integration of third-party components would be required, resulting control systems must provide high reliability regarding real-time requirements, which leads to integration complexity.
Closing this gap would particularly contribute to the vision of software-defined manufacturing, where an ecosystem of modern IT-based control system components could lead to faster innovations due to their higher abstraction and availability of various frameworks.
Therefore, this thesis addresses the research question:
How we can use modern IT technologies and enable independent evolution and easy third-party integration of software components in distributed control systems, where deterministic end-to-end reactivity is required, and especially, how can we apply distributed changes to such systems consistently and reactively during operation?
This thesis describes the challenges and related approaches in detail and points out that existing approaches do not fully address our research question.
To tackle this gap, a formal specification of a runtime platform concept is presented in conjunction with a model-based engineering approach.
The engineering approach decouples the engineering steps of component definition, integration, and deployment.
The runtime platform supports this approach by isolating the components, while still offering predictable end-to-end real-time behavior.
Independent evolution of software components is supported through a concept for synchronous reconfiguration during full operation, i.e., dynamic orchestration of components.
Time-critical state transfer is supported, too, and can lead to bounded quality degradation, at most.
The reconfiguration planning is supported by analysis concepts, including simulation of a formally specified system and reconfiguration, and analyzing potential quality degradation with the evolving dataflow graph (EDFG) method.
A platform-specific realization of the concepts, the real-time container architecture, is described as a reference implementation.
The model and the prototype are evaluated regarding their feasibility and applicability of the concepts by two case studies.
The first case study is a minimalistic distributed control system used in different setups with different component variants and reconfiguration plans to compare the model and the prototype and to gather runtime statistics.
The second case study is a smart factory showcase system with more challenging application components and interface technologies.
The conclusion is that the concepts are feasible and applicable, even though the concepts and the prototype still need to be worked on in future -- for example, to reach shorter cycle times.Eine große Auswahl von Drittanbieter-Lösungen ist einer der Schlüsselfaktoren für Software Ecosystems:
Nutzer profitieren vom breiten Angebot und schnellen Innovationen, während Drittanbieter über die Plattform erfolgreiche Lösungen anbieten können.
Das jedoch setzt eine gewisse Entkopplung von Entwicklungsschritten der Beteiligten voraus, welche für verteilte Steuerungssysteme noch nicht erreicht wurde.
Während Drittanbieter-Komponenten möglichst spät -- sogar Laufzeit -- integriert werden müssten, müssen Steuerungssysteme jedoch eine hohe Zuverlässigkeit gegenüber Echtzeitanforderungen aufweisen, was zu Integrationskomplexität führt.
Dies zu lösen würde insbesondere zur Vision von Software-definierter Produktion beitragen, da ein Ecosystem für moderne IT-basierte Steuerungskomponenten wegen deren höherem Abstraktionsgrad und der Vielzahl verfügbarer Frameworks zu schnellerer Innovation führen würde.
Daher behandelt diese Dissertation folgende Forschungsfrage:
Wie können wir moderne IT-Technologien verwenden und unabhängige Entwicklung und einfache Integration von Software-Komponenten in verteilten Steuerungssystemen ermöglichen, wo Ende-zu-Ende-Echtzeitverhalten gefordert ist, und wie können wir insbesondere verteilte Änderungen an solchen Systemen konsistent und im Vollbetrieb vornehmen?
Diese Dissertation beschreibt Herausforderungen und verwandte Ansätze im Detail und zeigt auf, dass existierende Ansätze diese Frage nicht vollständig behandeln.
Um diese Lücke zu schließen, beschreiben wir eine formale Spezifikation einer Laufzeit-Plattform und einen zugehörigen Modell-basierten Engineering-Ansatz.
Dieser Ansatz entkoppelt die Design-Schritte der Entwicklung, Integration und des Deployments von Komponenten.
Die Laufzeit-Plattform unterstützt den Ansatz durch Isolation von Komponenten und zugleich Zeit-deterministischem Ende-zu-Ende-Verhalten.
Unabhängige Entwicklung und Integration werden durch Konzepte für synchrone Rekonfiguration im Vollbetrieb unterstützt, also durch dynamische Orchestrierung.
Dies beinhaltet auch Zeit-kritische Zustands-Transfers mit höchstens begrenzter Qualitätsminderung, wenn überhaupt.
Rekonfigurationsplanung wird durch Analysekonzepte unterstützt, einschließlich der Simulation formal spezifizierter Systeme und Rekonfigurationen und der Analyse der etwaigen Qualitätsminderung mit dem Evolving Dataflow Graph (EDFG).
Die Real-Time Container Architecture wird als Referenzimplementierung und Evaluationsplattform beschrieben.
Zwei Fallstudien untersuchen Machbarkeit und Nützlichkeit der Konzepte.
Die erste verwendet verschiedene Varianten und Rekonfigurationen eines minimalistischen verteilten Steuerungssystems, um Modell und Prototyp zu vergleichen sowie Laufzeitstatistiken zu erheben.
Die zweite Fallstudie ist ein Smart-Factory-Demonstrator, welcher herausforderndere Applikationskomponenten und Schnittstellentechnologien verwendet.
Die Konzepte sind den Studien nach machbar und nützlich, auch wenn sowohl die Konzepte als auch der Prototyp noch weitere Arbeit benötigen -- zum Beispiel, um kürzere Zyklen zu erreichen
FPGA Implementation of Kalman Filter for Visible Light
Visible Light (VL) has received significant attention due to its benefits in energy efficiency, wide bandwidth availability, and resistance to electromagnetic interference. This final project discusses VL utilizing the Kalman Filter (KF) to predict and estimate the position of related data. The development of the VL method is carried out using Xilinx FPGA Arty A7 hardware, and the KF implementation is carried out in a two-dimensional framework with the Linear KF approach. The main objective of this Final Project is to implement VL using Photodetectors (Photodiode and Photoresistor LM393) on FPGA. The use of Xilinx FPGA Arty A7 hardware and Xilinx SDK software provides the flexibility and reliability required for system implementation. The results indicate that the implementation of Xilinx FPGA Arty A7-35T with KF and the use of 16 LED and 8 LED configurations yield relatively accurate estimations. While the Photodiode LM393 (PD LM393) sensor does not exhibit superior results compared to the Photoresistor LM393 (PR LM393) sensor, this research effectively optimizes light measurements by utilizing the sensor and KF algorithm. The Root Mean Squared Error (RMSE) results show that for the system with 16 LEDs, KF with PR LM393 has an RMSE of approximately ). This RMSE value indicates that KF with PR LM393 can provide relatively more accurate estimations. Similarly, for the system with 8 LEDs, KF with PR LM393 has an RMSE of around ). In this case, KF with PR LM393 again provides relatively more accurate estimations. Meanwhile, the RMSE result for 2D KF in this system is approximately ), indicating that the KF estimation has a relatively small error value compared to the actual measurement value. This demonstrates that KF effectively reduces noise and measurement data fluctuations in the LM393 Photodetector system with 16 LEDs
Consensual Resilient Control: Stateless Recovery of Stateful Controllers
Safety-critical systems have to absorb accidental and malicious faults to obtain high mean-times-to-failures (MTTFs). Traditionally, this is achieved through re-execution or replication. However, both techniques come with significant overheads, in particular when cold-start effects are considered. Such effects occur after replicas resume from checkpoints or from their initial state. This work aims at improving on the performance of control-task replication by leveraging an inherent stability of many plants to tolerate occasional control-task deadline misses and suggests masking faults just with a detection quorum. To make this possible, we have to eliminate cold-start effects to allow replicas to rejuvenate during each control cycle. We do so, by systematically turning stateful controllers into instants that can be recovered in a stateless manner. We highlight the mechanisms behind this transformation, how it achieves consensual resilient control, and demonstrate on the example of an inverted pendulum how accidental and maliciously-induced faults can be absorbed, even if control tasks run in less predictable environments
- …