293 research outputs found

    Stress Injection Study on Hard Real-Time Operating Systems

    Get PDF
    The automotive software complexity has increased exponentially in the last 30 years. Nowadays, automotive applications are built on top of hard real-time operating system where many tasks are executed. Due to the automotive high integration levels and the time-to-market, software integration and robustness tests should be performed effectively and efficiently. Infineon Technologies for the AURIX 2G microcontroller has integrated a novel hardware architecture to support the Resource Usage Test and the Stress Test. Despite this hardware support, it has never been used before. Then, it is critical to propose a method to efficiently use this structure and to allow the evaluation of the performance and reliability of the chips. This thesis develops a method and a tool that uses stress injection to analyze the performance, robustness values and boundaries of hard real-time systems under different scenarios. The designer is able: i) to configure the embedded debugging hardware architecture to efficiently explore different stress scenarios; ii) to gather information; and to quantify different types of performance and robustness metrics. The method is automated and fully parameterizable. The developed tool in this thesis is called Galenus, it is integrated into the already existing internal debugging environment of Infineon Technologies for the AURIX microcontroller. The stress injection is based on the reduction of the effective performance of a SoC component (e.g., TriCore within AURIX). The stress injection allows to assess the sensitivity of the SoC under different stress scenarios. These scenarios are defined on the offline initial state using formal methods of scheduling theory. Using the stress injection method, the SoC designer can identify possible risk scenarios testing the performance and robustness of the system at runtime. This thesis is based on the stress injection by CPU suspension within two types of software application, RTOS and Bare-metal

    LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

    Get PDF
    LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft

    Federated learning on embedded devices

    Get PDF
    TinyML ha guanyat molta popularitat en aquests els últims anys, portant ML a dispositius amb poca memòria, capacitat de computació i ús d'energia. Entrenar models en ordinadors potents amb grans datasets i exportar el model comprimit resultant per a fer-lo servir només per a inferència als microcontroladors ha estat estudiat extensivament. Però aquest mètode no permet que el dispositiu continuï aprenent a partir de noves dades. A una era on la privacitat de les dades és essencial, guardar i administrar els datasets usats per a entrenar aquests models pot ser un problema. Moure l'entrenament de la xarxa neuronal al dispositiu pot eliminar la necessitat de guardar i transmetre dades sensitives, ja que aquestes dades seran capturades, emprades per entrenar el model i eliminades després al dispositiu. Fer servir FL permet diversos dispositius entrenar i compartir els seus models sense la necessitat de transmetre les dades recol·lectades. En aquest treball explorem com TinyML amb entrenament al mateix dispositiu en conjunció amb FL pot ser usat, les dificultats que planteja i possibles solucions.TinyML has gained a lot of popularity in recent years, bringing ML to devices constrained by memory, computation capacity and power. Training models on powerful computers with big datasets and exporting the compressed resulting model to be used for only inference on microcontrollers has been studied extensively. But this method does not allow for an edge device to keep on learning from new data. In an era where data privacy is essential, storing and managing the datasets used to train these models can be a problem. Moving the training of the NN to the edge device can eradicate the need of storing or transmitting any sensitive data, since this data will be captured, used once to train the model and discarded afterwards. Also, using FL enables multiple devices to train and share their models without the need of transmitting any collected data. In this work, we explore how can TinyML can be used with on-device training in combination with FL, what issues does it raise and possible solutions

    CSP channels for CAN-bus connected embedded control systems

    Get PDF
    Closed loop control system typically contains multitude of sensors and actuators operated simultaneously. So they are parallel and distributed in its essence. But when mapping this parallelism to software, lot of obstacles concerning multithreading communication and synchronization issues arise. To overcome this problem, the CT kernel/library based on CSP algebra has been developed. This project (TES.5410) is about developing communication extension to the CT library to make it applicable in distributed systems. Since the library is tailored for control systems, properties and requirements of control systems are taken into special consideration. Applicability of existing middleware solutions is examined. A comparison of applicable fieldbus protocols is done in order to determine most suitable ones and CAN fieldbus is chosen to be first fieldbus used. Brief overview of CSP and existing CSP based libraries is given. Middleware architecture is proposed along with few novel ideas

    A Course On Advanced Real-Time Embedded Systems

    Get PDF
    This thesis discusses the development of an advanced real-time embedded systems course offered at California Polytechnic State University, San Luis Obispo, which aims to prepare students to design modern complex real-time embedded systems. It describes the goals of the real-time embedded systems curriculum, which includes an introductory and advanced course. Finally, this paper discusses the challenges of creating a successful advanced real-time embedded systems course and proposes changes to the current advanced real-time embedded systems course in response to those challenges

    A Design in interfacing the MC68HC11 to the AMD AM29F010 flash memory chips

    Get PDF
    In many environments, motion, vibration and contamination to the Secondary Storage Devices such as hard drives can cause data to become unreadable or even lost. Elimination of these types of magnetic drives, incorporating its replacement with a Solid State Memory Storage Device would provide an invaluable solution for these type of environments. If a secondary storage system could replace these electro-mechanical disk drive systems incorporating a Solid State Secondary Storage Device such as the Flash Memory Integrated Chips, an increase in the speed of reading from milli-seconds to nano-seconds would transpire as well as providing a robust Secondary Storage Device. In addition to this the rapid increase in the sophistication of software has placed more pressure on the microcontroller to increase its memory capacity, especially that of user RAM. From this need, it is the aim of this thesis to show steps in the designing an interface to the MC68HCII microcontroller that would increase the user RAM. The design incorporates four Am29F010 Flash Memory Chips as the peripheral Secondary Storage Device

    Distributed Control Architecture

    Get PDF
    This document describes the development and testing of a novel Distributed Control Architecture (DCA). The DCA developed during the study is an attempt to turn the components used to construct unmanned vehicles into a network of intelligent devices, connected using standard networking protocols. The architecture exists at both a hardware and software level and provides a communication channel between control modules, actuators and sensors. A single unified mechanism for connecting sensors and actuators to the control software will reduce the technical knowledge required by platform integrators and allow control systems to be rapidly constructed in a Plug and Play manner. DCA uses standard networking hardware to connect components, removing the need for custom communication channels between individual sensors and actuators. The use of a common architecture for the communication between components should make it easier for software to dynamically determine the vehicle s current capabilities and increase the range of processing platforms that can be utilised. Implementations of the architecture currently exist for Microsoft Windows, Windows Mobile 5, Linux and Microchip dsPIC30 microcontrollers. Conceptually, DCA exposes the functionality of each networked device as objects with interfaces and associated methods. Allowing each object to expose multiple interfaces allows for future upgrades without breaking existing code. In addition, the use of common interfaces should help facilitate component reuse, unit testing and make it easier to write generic reusable software

    A hardware scheduler based on task queues for FPGA-based embedded real-time systems

    Get PDF
    A hardware scheduler is developed to improve real-time performance of soft-core processor based computing systems. A hardware scheduler typically accelerates system performance at the cost of increased hardware resources, inflexibility and integration difficulty. However, the reprogrammability of FPGA-based systems removes the problems of inflexibility and integration difficulty. This paper introduces a new task-queue architecture to better support practical task controls and maintain good resource scaling. The scheduler can be configured to support various algorithms such as time sliced priority scheduling, Earliest Deadline First and Least Slack Time. The hardware scheduler reduces scheduling overhead by more than 1,000 clock cycles and raises the system utilization bound by a maximum 19.2 percent. Scheduling jitter is reduced from hundreds of clock cycles in software to just two or three cycles for most operations. The additional resource cost is no more than 17 percent of a typical softcore system for a small scale embedded application

    Remote Attacks on FPGA Hardware

    Get PDF
    Immer mehr Computersysteme sind weltweit miteinander verbunden und über das Internet zugänglich, was auch die Sicherheitsanforderungen an diese erhöht. Eine neuere Technologie, die zunehmend als Rechenbeschleuniger sowohl für eingebettete Systeme als auch in der Cloud verwendet wird, sind Field-Programmable Gate Arrays (FPGAs). Sie sind sehr flexible Mikrochips, die per Software konfiguriert und programmiert werden können, um beliebige digitale Schaltungen zu implementieren. Wie auch andere integrierte Schaltkreise basieren FPGAs auf modernen Halbleitertechnologien, die von Fertigungstoleranzen und verschiedenen Laufzeitschwankungen betroffen sind. Es ist bereits bekannt, dass diese Variationen die Zuverlässigkeit eines Systems beeinflussen, aber ihre Auswirkungen auf die Sicherheit wurden nicht umfassend untersucht. Diese Doktorarbeit befasst sich mit einem Querschnitt dieser Themen: Sicherheitsprobleme die dadurch entstehen wenn FPGAs von mehreren Benutzern benutzt werden, oder über das Internet zugänglich sind, in Kombination mit physikalischen Schwankungen in modernen Halbleitertechnologien. Der erste Beitrag in dieser Arbeit identifiziert transiente Spannungsschwankungen als eine der stärksten Auswirkungen auf die FPGA-Leistung und analysiert experimentell wie sich verschiedene Arbeitslasten des FPGAs darauf auswirken. In der restlichen Arbeit werden dann die Auswirkungen dieser Spannungsschwankungen auf die Sicherheit untersucht. Die Arbeit zeigt, dass verschiedene Angriffe möglich sind, von denen früher angenommen wurde, dass sie physischen Zugriff auf den Chip und die Verwendung spezieller und teurer Test- und Messgeräte erfordern. Dies zeigt, dass bekannte Isolationsmaßnahmen innerhalb FPGAs von böswilligen Benutzern umgangen werden können, um andere Benutzer im selben FPGA oder sogar das gesamte System anzugreifen. Unter Verwendung von Schaltkreisen zur Beeinflussung der Spannung innerhalb eines FPGAs zeigt diese Arbeit aktive Angriffe, die Fehler (Faults) in anderen Teilen des Systems verursachen können. Auf diese Weise sind Denial-of-Service Angriffe möglich, als auch Fault-Angriffe um geheime Schlüsselinformationen aus dem System zu extrahieren. Darüber hinaus werden passive Angriffe gezeigt, die indirekt die Spannungsschwankungen auf dem Chip messen. Diese Messungen reichen aus, um geheime Schlüsselinformationen durch Power Analysis Seitenkanalangriffe zu extrahieren. In einer weiteren Eskalationsstufe können sich diese Angriffe auch auf andere Chips auswirken die an dasselbe Netzteil angeschlossen sind wie der FPGA. Um zu beweisen, dass vergleichbare Angriffe nicht nur innerhalb FPGAs möglich sind, wird gezeigt, dass auch kleine IoT-Geräte anfällig für Angriffe sind welche die gemeinsame Spannungsversorgung innerhalb eines Chips ausnutzen. Insgesamt zeigt diese Arbeit, dass grundlegende physikalische Variationen in integrierten Schaltkreisen die Sicherheit eines gesamten Systems untergraben können, selbst wenn der Angreifer keinen direkten Zugriff auf das Gerät hat. Für FPGAs in ihrer aktuellen Form müssen diese Probleme zuerst gelöst werden, bevor man sie mit mehreren Benutzern oder mit Zugriff von Drittanbietern sicher verwenden kann. In Veröffentlichungen die nicht Teil dieser Arbeit sind wurden bereits einige erste Gegenmaßnahmen untersucht

    Reduced precision floating-point optimization for Deep Neural Network On-Device Learning on microcontrollers

    Get PDF
    Enabling On-Device Learning (ODL) for Ultra-Low-Power Micro-Controller Units (MCUs) is a key step for post-deployment adaptation and fine-tuning of Deep Neural Network (DNN) models in future TinyML applications. This paper tackles this challenge by introducing a novel reduced precision optimization technique for ODL primitives on MCU-class devices, leveraging the State-of-Art advancements in RISC-V RV32 architectures with support for vectorized 16-bit floating-point (FP16) Single-Instruction Multiple-Data (SIMD) operations. Our approach for the Forward and Backward steps of the Back Propagation training algorithm is composed of specialized shape transform operators and Matrix Multiplication (MM) kernels, accelerated with parallelization and loop unrolling. When evaluated on a single training step of a 2D Convolution layer, the SIMD-optimized FP16 primitives result up to 1.72x faster than the FP32 baseline on a RISC-V-based 8+1-core MCU. An average computing efficiency of 3.11 Multiply and Accumulate operations per clock cycle (MAC/clk) and 0.81 MAC/clk is measured for the end-to-end training tasks of a ResNet8 and a DS-CNN for Image Classification and Keyword Spotting, respectively - requiring 17.1 ms and 6.4 ms on the target platform to compute a training step on a single sample. Overall, our approach results more than two orders of magnitude faster than existing ODL software frameworks for single-core MCUs and outperforms by 1.6x previous FP32 parallel implementations on a Continual Learning setup.& COPY; 2023 Elsevier B.V. All rights reserved
    • …
    corecore