Search CORE

128 research outputs found

병렬 및 분산 임베디드 시스템을 위한 모델 기반 코드 생성 프레임워크

Author: 정은진
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :공과대학 컴퓨터공학부,2020. 2. 하순회.소프트웨어 설계 생산성 및 유지보수성을 향상시키기 위해 다양한 소프트웨어 개발 방법론이 제안되었지만, 대부분의 연구는 응용 소프트웨어를 하나의 프로세서에서 동작시키는 데에 초점을 맞추고 있다. 또한, 임베디드 시스템을 개발하는 데에 필요한 지연이나 자원 요구 사항에 대한 비기능적 요구 사항을 고려하지 않고 있기 때문에 일반적인 소프트웨어 개발 방법론을 임베디드 소프트웨어를 개발하는 데에 적용하는 것은 적합하지 않다. 이 논문에서는 병렬 및 분산 임베디드 시스템을 대상으로 하는 소프트웨어를 모델로 표현하고, 이를 소프트웨어 분석이나 개발에 활용하는 개발 방법론을 소개한다. 우리의 모델에서 응용 소프트웨어는 계층적으로 표현할 수 있는 여러 개의 태스크로 이루어져 있으며, 하드웨어 플랫폼과 독립적으로 명세한다. 태스크 간의 통신 및 동기화는 모델이 정의한 규약이 정해져 있고, 이러한 규약을 통해 실제 프로그램을 실행하기 전에 소프트웨어 에러를 정적 분석을 통해 확인할 수 있고, 이는 응용의 검증 복잡도를 줄이는 데에 기여한다. 지정한 하드웨어 플랫폼에서 동작하는 프로그램은 태스크들을 프로세서에 매핑한 이후에 자동적으로 합성할 수 있다. 위의 모델 기반 소프트웨어 개발 방법론에서 사용하는 프로그램 합성기를 본 논문에서 제안하였는데, 명세한 플랫폼 요구 사항을 바탕으로 병렬 및 분산 임베디드 시스템을에서 동작하는 코드를 생성한다. 여러 개의 정형적 모델들을 계층적으로 표현하여 응용의 동적 행태를 나타고, 합성기는 여러 모델로 구성된 계층적인 모델로부터 병렬성을 고려하여 태스크를 실행할 수 있다. 또한, 프로그램 합성기에서 다양한 플랫폼이나 네트워크를 지원할 수 있도록 코드를 관리하는 방법도 보여주고 있다. 본 논문에서 제시하는 소프트웨어 개발 방법론은 6개의 하드웨어 플랫폼과 3 종류의 네트워크로 구성되어 있는 실제 감시 소프트웨어 시스템 응용 예제와 이종 멀티 프로세서를 활용하는 원격 딥 러닝 예제를 수행하여 개발 방법론의 적용 가능성을 시험하였다. 또한, 프로그램 합성기가 새로운 플랫폼이나 네트워크를 지원하기 위해 필요로 하는 개발 비용도 실제 측정 및 예측하여 상대적으로 적은 노력으로 새로운 플랫폼을 지원할 수 있음을 확인하였다. 많은 임베디드 시스템에서 예상치 못한 하드웨어 에러에 대해 결함을 감내하는 것을 필요로 하기 때문에 결함 감내에 대한 코드를 자동으로 생성하는 연구도 진행하였다. 본 기법에서 결함 감내 설정에 따라 태스크 그래프를 수정하는 방식을 활용하였으며, 결함 감내의 비기능적 요구 사항을 응용 개발자가 쉽게 적용할 수 있도록 하였다. 또한, 결함 감내 지원하는 것과 관련하여 실제 수동으로 구현했을 경우와 비교하였고, 결함 주입 도구를 이용하여 결함 발생 시나리오를 재현하거나, 임의로 결함을 주입하는 실험을 수행하였다. 마지막으로 결함 감내를 실험할 때에 활용한 결함 주입 도구는 본 논문의 또 다른 기여 사항 중 하나로 리눅스 환경으로 대상으로 응용 영역 및 커널 영역에 결함을 주입하는 도구를 개발하였다. 시스템의 견고성을 검증하기 위해 결함을 주입하여 결함 시나리오를 재현하는 것은 널리 사용되는 방법으로, 본 논문에서 개발된 결함 주입 도구는 시스템이 동작하는 도중에 재현 가능한 결함을 주입할 수 있는 도구이다. 커널 영역에서의 결함 주입을 위해 두 종류의 결함 주입 방법을 제공하며, 하나는 커널 GNU 디버거를 이용한 방법이고, 다른 하나는 ARM 하드웨어 브레이크포인트를 활용한 방법이다. 응용 영역에서 결함을 주입하기 위해 GDB 기반 결함 주입 방법을 이용하여 동일 시스템 혹은 원격 시스템의 응용에 결함을 주입할 수 있다. 결함 주입 도구에 대한 실험은 ODROID-XU4 보드에서 진행하였다.While various software development methodologies have been proposed to increase the design productivity and maintainability of software, they usually focus on the development of application software running on a single processing element, without concern about the non-functional requirements of an embedded system such as latency and resource requirements. In this thesis, we present a model-based software development method for parallel and distributed embedded systems. An application is specified as a set of tasks that follow a set of given rules for communication and synchronization in a hierarchical fashion, independently of the hardware platform. Having such rules enables us to perform static analysis to check some software errors at compile time to reduce the verification difficulty. Platform-specific program is synthesized automatically after mapping of tasks onto processing elements is determined. The program synthesizer is also proposed to generate codes which satisfies platform requirements for parallel and distributed embedded systems. As multiple models which can express dynamic behaviors can be depicted hierarchically, the synthesizer supports to manage multiple task graphs with a different hierarchy to run tasks with parallelism. Also, the synthesizer shows methods of managing codes for heterogeneous platforms and generating various communication methods. The viability of the proposed software development method is verified with a real-life surveillance application that runs on six processing elements with three remote communication methods, and remote deep learning example is conducted to use heterogeneous multiprocessing components on distributed systems. Also, supporting a new platform and network requires a small effort by measuring and estimating development costs. Since tolerance to unexpected errors is a required feature of many embedded systems, we also support an automatic fault-tolerant code generation. Fault tolerance can be applied by modifying the task graph based on the selected fault tolerance configurations, so the non-functional requirement of fault tolerance can be easily adopted by an application developer. To compare the effort of supporting fault tolerance, manual implementation of fault tolerance is performed. Also, the fault tolerance method is tested with the fault injection tool to emulate fault scenarios and inject faults randomly. Our fault injection tool, which has used for testing our fault-tolerance method, is another work of this thesis. Emulating fault scenarios by intentionally injecting faults is commonly used to test and verify the robustness of a system. To emulate faults on an embedded system, we present a run-time fault injection framework that can inject a fault on both a kernel and application layer of Linux-based systems. For injecting faults on a kernel layer, two complementary fault injection techniques are used. One is based on Kernel GNU Debugger, and the other is using a hardware breakpoint supported by the ARM architecture. For application-level fault injection, the GDB-based fault injection method is used to inject a fault on a remote application. The viability of the proposed fault injection tool is proved by real-life experiments with an ODROID-XU4 system.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contribution 6 1.3 Dissertation Organization 8 Chapter 2 Background 9 2.1 HOPES: Hope of Parallel Embedded Software 9 2.1.1 Software Development Procedure 9 2.1.2 Components of HOPES 12 2.2 Universal Execution Model 13 2.2.1 Task Graph Specification 13 2.2.2 Dataflow specification of an Application 15 2.2.3 Task Code Specification and Generic APIs 21 2.2.4 Meta-data Specification 23 Chapter 3 Program Synthesis for Parallel and Distributed Embedded Systems 24 3.1 Motivational Example 24 3.2 Program Synthesis Overview 26 3.3 Program Synthesis from Hierarchically-mixed Models 30 3.4 Platform Code Synthesis 33 3.5 Communication Code Synthesis 36 3.6 Experiments 40 3.6.1 Development Cost of Supporting New Platforms and Networks 40 3.6.2 Program Synthesis for the Surveillance System Example 44 3.6.3 Remote GPU-accelerated Deep Learning Example 46 3.7 Document Generation 48 3.8 Related Works 49 Chapter 4 Model Transformation for Fault-tolerant Code Synthesis 56 4.1 Fault-tolerant Code Synthesis Techniques 56 4.2 Applying Fault Tolerance Techniques in HOPES 61 4.3 Experiments 62 4.3.1 Development Cost of Applying Fault Tolerance 62 4.3.2 Fault Tolerance Experiments 62 4.4 Random Fault Injection Experiments 65 4.5 Related Works 68 Chapter 5 Fault Injection Framework for Linux-based Embedded Systems 70 5.1 Background 70 5.1.1 Fault Injection Techniques 70 5.1.2 Kernel GNU Debugger 71 5.1.3 ARM Hardware Breakpoint 72 5.2 Fault Injection Framework 74 5.2.1 Overview 74 5.2.2 Architecture 75 5.2.3 Fault Injection Techniques 79 5.2.4 Implementation 83 5.3 Experiments 90 5.3.1 Experiment Setup 90 5.3.2 Performance Comparison of Two Fault Injection Methods 90 5.3.3 Bit-flip Fault Experiments 92 5.3.4 eMMC Controller Fault Experiments 94 Chapter 6 Conclusion 97 Bibliography 99 요 약 108Docto

SNU Open Repository and Archive

Design and Implementation of HD Wireless Video Transmission System Based on Millimeter Wave

Author: Gao Weijun
Publication venue
Publication date: 29/05/2023
Field of study

With the improvement of optical fiber communication network construction and the improvement of camera technology, the video that the terminal can receive becomes clearer, with resolution up to 4K. Although optical fiber communication has high bandwidth and fast transmission speed, it is not the best solution for indoor short-distance video transmission in terms of cost, laying difficulty and speed. In this context, this thesis proposes to design and implement a multi-channel wireless HD video transmission system with high transmission performance by using the 60GHz millimeter wave technology, aiming to improve the bandwidth from optical nodes to wireless terminals and improve the quality of video transmission. This thesis mainly covers the following parts: (1) This thesis implements wireless video transmission algorithm, which is divided into wireless transmission algorithm and video transmission algorithm, such as 64QAM modulation and demodulation algorithm, H.264 video algorithm and YUV420P algorithm. (2) This thesis designs the hardware of wireless HD video transmission system, including network processing unit (NPU) and millimeter wave module. Millimeter wave module uses RWM6050 baseband chip and TRX-BF01 rf chip. This thesis will design the corresponding hardware circuit based on the above chip, such as 10Gb/s network port, PCIE. (3) This thesis realizes the software design of wireless HD video transmission system, selects FFmpeg and Nginx to build the sending platform of video transmission system on NPU, and realizes video multiplex transmission with Docker. On the receiving platform of video transmission, FFmpeg and Qt are selected to realize video decoding, and OpenGL is combined to realize video playback. (4) Finally, the thesis completed the wireless HD video transmission system test, including pressure test, Web test and application scenario test. It has been verified that its HD video wireless transmission system can transmit HD VR video with three-channel bit rate of 1.2GB /s, and its rate can reach up to 3.7GB /s, which meets the research goal

UTUPub

MaldOS: a Moderately Abstracted Layer for Developing Operating Systems

Author: Maldini Mattia
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 14/03/2019
Field of study

Anche se pochi studenti affronteranno la sfida di sviluppare software al di sotto del sistema operativo, la comprensione dei suoi principi di funzionamento è essenziale. In sè, la teoria dietro ai sistemi operativi non è particolarmente complessa: concetti come scheduling, livelli di esecuzione e semafori sono intuitivamente comprensibili; tuttavia appropriarsi pienamente di queste nozioni soltanto tramite lo studio teorico è quasi impossibile: serve un esempio pratico per assimilare i dettagli. Sviluppare un sistema operativo come progetto accademico è però diversi ordini di grandezza più difficile che creare un software in ambiente di lavoro già esistente. La complessità aggiunta dell'hardware va spesso oltre a quello che ci si aspetta dagli studenti, il che rende difficile anche soltanto la ricerca di un'architettura su cui lavorare. Questo studio è fortemente ispirato da precedenti soluzioni a questo problema come uMPS, un emulatore per il processore MIPS. Lavorando su una virtualizzazione semplificata gli studenti si possono concentrare sui concetti chiave dello sviluppo di un SO. Anche se ispirato a un'architettura reale, uMPS rimane comunque un ambiente astratto, e nel corso del lavoro potrebbe sorgere una sensazione di distacco dalla realtà. In questo studio si sostiene che un progetto simile possa essere sviluppato su hardware reale senza che questo diventi troppo complicato. L'architettura scelta è ARMv8, più moderna e diffusa rispetto a MIPS, nella forma della board educativa Raspberry Pi. Il risultato del lavoro è duplice: da una parte è stato portato avanti uno studio dettagliato su come sviluppare un sistema operativo minimale sul Raspberry Pi, dall'altra è stato creato un layer di astrazione che si occupa di semplificare l'approccio alle periferiche, permettendo agli utenti di costruirci sopra un piccolo sistema operativo. Pur facendo riferimento a un dispositivo reale, la possibilità di lavorare su un emulatore rimane grazie al supporto di Qemu

AMS Tesi di Laurea

Redesigning Transaction Processing Systems for Non-Volatile Memory

Author: Kim Wookhee
Publication venue: Graduate School of UNIST
Publication date: 01/02/2019
Field of study

Department of Computer Science and EngineeringTransaction Processing Systems are widely used because they make the user be able to manage their data more efficiently. However, they suffer performance bottleneck due to the redundant I/O for guaranteeing data consistency. In addition to the redundant I/O, slow storage device makes the performance more degraded. Leveraging non-volatile memory is one of the promising solutions the performance bottleneck in Transaction Processing Systems. However, since the I/O granularity of legacy storage devices and non-volatile memory is not equal, traditional Transaction Processing System cannot fully exploit the performance of persistent memory. The goal of this dissertation is to fully exploit non-volatile memory for improving the performance of Transaction Processing Systems. Write amplification between Transaction Processing System is pointed out as a performance bottleneck. As first approach, we redesigned Transaction Processing Systems to minimize the redundant I/O between the Transaction Processing Systems. We present LS-MVBT that integrates recovery information into the main database file to remove temporary files for recovery. The LS-MVBT also employs five optimizations to reduce the write traffics in single fsync() calls. We also exploit the persistent memory to reduce the performance bottleneck from slow storage devices. However, since the traditional recovery method is for slow storage devices, we develop byte-addressable differential logging, user-level heap manager, and transaction-aware persistence to fully exploit the persistent memory. To minimize the redundant I/O for guarantee data consistency, we present the failure-atomic slotted paging with persistent buffer cache. Redesigning indexing structure is the second approach to exploit the non-volatile memory fully. Since the B+-tree is originally designed for block granularity, It generates excessive I/O traffics in persistent memory. To mitigate this traffic, we develop cache line friendly B+-tree which aligns its node size to cache line size. It can minimize the write traffic. Moreover, with hardware transactional memory, it can update its single node atomically without any additional redundant I/O for guaranteeing data consistency. It can also adapt Failure-Atomic Shift and Failure-Atomic In-place Rebalancing to eliminate unnecessary I/O. Furthermore, We improved the persistent memory manager that exploit traditional memory heap structure with free-list instead of segregated lists for small memory allocations to minimize the memory allocation overhead. Our performance evaluation shows that our improved version that consider I/O granularity of non-volatile memory can efficiently reduce the redundant I/O traffic and improve the performance by large of a margin.ope

ScholarWorks@UNIST

Embedded Firmware Solutions

Author: Jones Marc
Reinauer Stefan
Sun Jiming
Zimmer Vincent
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2020
Field of study

Computer scienc

Directory of Open Access Books (DOAB)

A real-time capable dynamic partial reconfiguration system for an applicationspecific soft-core processor

Author: Fengler Wolfgang
Kerling Philipp
Kirchhoff Michael
Streitferdt Detlef
Publication venue: 'Hindawi Limited'
Publication date: 31/08/2019
Field of study

Modern FPGAs (Field Programmable Gate Arrays) are becoming increasingly important when it comes to embedded system development. Within these FPGAs, soft-core processors are often used to solve a wide range of different tasks. Soft-core processors are a cost-effective and time-efficient way to realize embedded systems. When using the full potential of FPGAs, it is possible to dynamically reconfigure parts of them during run time without the need to stop the device. This feature is called dynamic partial reconfiguration (DPR). If the DPR approach is to be applied in a real-time application-specific soft-core processor, an architecture must be created that ensures strict compliance with the real-time constraint at all times. In this paper, a novel method that addresses this problem is introduced, and its realization is described. In the first step, an application-specializable soft-core processor is presented that is capable of solving problems while adhering to hard real-time deadlines. This is achieved by the full design time analyzability of the soft-core processor. Its special architecture and other necessary features are discussed. Furthermore, a method for the optimized generation of partial bitstreams for the DPR as well as its practical implementation in a tool is presented. This tool is able to minimize given bitstreams with the help of a differential frame bitmap. Experiments that realize the DPR within the soft-core framework are presented, with respect to the need for hard real-time capability. Those experiments show a significant resource reduction of about 40% compared to a functionally equivalent non-DPR design

Digitale Bibliothek Thüringen

A General Methodology to Optimize and Benchmark Edge Devices

Author: Smathers Kyle J.
Publication venue: AFIT Scholar
Publication date: 26/03/2020
Field of study

The explosion of Internet Of Things (IoT), embedded and “smart” devices has also seen the addition of “general purpose” single board computers also referred to as “edge devices.” Determining if one of these generic devices meets the need of a new given task however can be challenging. Software generically written to be portable or plug and play may be too bloated to work properly without significant modification due to much tighter hardware resources. Previous work in this area has been focused on micro or chip-level benchmarking which is mainly useful for chip designers or low level system integrators. A higher or macro level method is needed to not only observe the behavior of these devices under a load but ensure they are appropriately configured for the new task, especially as they begin being integrated on platforms with higher cost of failure like self driving cars or drones. In this research we propose a macro level methodology that iteratively benchmarks and optimizes specific workloads on edge devices. With automation provided by Ansible, a multi stage 2k full factorial experiment and robust analysis process ensures the test workload is maximizing the use of available resources before establishing a final benchmark score. By framing the validation tests with a family of network security monitoring applications an end to end scenario fully exercises and validates the developed process. This also provides an additional vector for future research in the realm of network security. The analysis of the results show the developed process met its original design goals and intentions, with the added fact that the latest edge devices like the XAVIER, TX2 and RPi4 can easily perform as an edge network sensor

AFTI Scholar (Air Force Institute of Technology)

Cost effective technology applied to domotics and smart home energy management systems

Author: Gutiérrez Peña Javier Alberto
Publication venue: 'Universidad de Cordoba'
Publication date: 01/01/2022
Field of study

Premio extraordinario de Trabajo Fin de Máster curso 2019/2020. Máster en Energías Renovables DistribuidasIn this document is presented the state of art for domotics cost effective technologies available on market nowadays, and how to apply them in Smart Home Energy Management Systems (SHEMS) allowing peaks shaving, renewable management and home appliance controls, always in cost effective context in order to be massively applied. Additionally, beyond of SHEMS context, it will be also analysed how to apply this technology in order to increase homes energy efficiency and monitoring of home appliances. Energy management is one of the milestones for distributed renewable energy spread; since renewable energy sources are not time-schedulable, are required control systems capable of the management for exchanging energy between conventional sources (power grid), renewable sources and energy storage sources. With the proposed approach, there is a first block dedicated to show an overview of Smart Home Energy Management Systems (SMHEMS) classical architecture and functional modules of SHEMS; next step is to analyse principles which has allowed some devices to become a cost-effective technology. Once the technology has been analysed, it will be reviewed some specific resources (hardware and software) available on marked for allowing low cost SHEMS. Knowing the “tools” available; it will be shown how to adapt classical SHEMS to cost effective technology. Such way, this document will show some specific applications of SHEMS. Firstly, in a general point of view, comparing the proposed low-cost technology with one of the main existing commercial proposals; and secondly, developing the solution for a specific real case.En este documento se aborda el estado actual de la domótica de bajo coste disponible en el mercado actualmente y cómo aplicarlo en los sistemas inteligentes de gestión energética en la vivienda (SHEMS) permitiendo el recorte de las puntas de demanda, gestión de energías renovables y control de electrodomésticos, siempre en el contexto del bajo coste, con el objetivo de lograr la máxima difusión de los SHEMS. Adicionalmente, más allá del contexto de la tecnología SHEMS, se analizará cómo aplicar esta tecnología para aumentar la eficiencia energética de los hogares y para la supervisión de los electrodomésticos. La gestión energética es uno de los factores principales para lograr la difusión de las energías renovables distribuidas; debido a que las fuentes de energía renovable no pueden ser planificadas, se requieren sistemas de control capaces de gestionar el intercambio de energía entre las fuentes convencionales (red eléctrica de distribución), energías renovables y dispositivos de almacenamiento energético. Bajo esta perspectiva, este documento presenta un primer bloque en el que se exponen las bases de la arquitectura y módulos funcionales de los sistemas inteligentes de gestión energética en la vivienda (SHEMS); el siguiente paso será analizar los principios que han permitido a ciertos dispositivos convertirse en dispositivos de bajo coste. Una vez analizada la tecnología, nos centraremos en los recursos (hardware y software) existentes que permitirán la realización de un SHEMS a bajo coste. Conocidas las “herramientas” a nuestra disposición, se mostrará como adaptar un esquema SHEMS clásico a la tecnología de bajo coste. Primeramente, comparando de modo genérico la tecnología de bajo coste con una de las principales propuestas comerciales de SHEMS, para seguidamente desarrollar la solución de bajo coste a un caso específico real

Repositorio Institucional de la Universidad de Córdoba

Conceptual design and realization of a dynamic partial reconfiguration extension of an existing soft-core processor

Author: Kerling Philipp
Publication venue
Publication date: 01/01/2019
Field of study

Viele aktuelle Field Programmable Gate Arrays (FPGAs) unterstützen die Technik der partiellen Rekonfiguration (PR), durch die dynamisch zur Laufzeit ein Hardware-Design auch nur teilweise ausgetauscht werden kann. Die vorliegende Arbeit integriert PR-Funktionalität in die an der Technischen Universität Ilmenau für harte Echtzeitaufgaben mit hochpräzisen Fließkommaberechnungen entwickelte VHDL Integrated Softcore Architecture for Reconfigurable Devices (ViSARD). Zu diesem Zweck wird die arithmetisch-logische Einheit angepasst, um das Auswechseln von Fließkomma-Ausführungseinheiten zu ermöglichen. Ziele der Entwicklung des PR-Systems sind hohe Geschwindigkeit, niedrige Latenz, niedrige Ressourcenkosten und harte Echtzeitfähigkeit. Erreicht werden diese durch die Umsetzung einer eigenen Steuereinheit (partial reconfiguration controller), die partielle Bitströme aus externem RAM über einen standardmäßigen AXI-Bus lädt sowie die entsprechende Erweiterung der ViSARD. In einem Testdesign, das zwischen drei verschiedenen Konfigurationen mit je zwischen einer und drei Ausführungseinheiten wechselt, hat das entwickelte PR-System den maximal spezifierten Bitstromdurchsatz auf dem Ziel-FPGA erreicht und den Verbrauch an Lookup-Tabellen um etwa 40 % verringert.Many modern field-programmable gate arrays (FPGAs) support partial reconfiguration, which allows to dynamically replace only a part of a design at run time. In this thesis, partial reconfiguration capability is integrated with the VHDL Integrated Softcore Architecture for Reconfigurable Devices (ViSARD) developed at Technische Universität Ilmenau and conceived for hard real-time tasks requiring floating-point calculations with high precision. Specifically, its arithmetic logic unit is modified to allow exchanging floating-point arithmetic execution units. Design goals of the partial reconfiguration system are high speed, low latency, low resource overhead, and hard real-time capability. They are reached by implementing a custom partial reconfiguration controller loading partial bitstreams from external RAM over a standard AXI bus and extending the ViSARD appropriately. In a test design that switched between 3 different configurations each containing between 1 and 3 execution units, the proposed partial reconfiguration system achieved the maximum specified bitstream throughput on the target FPGA and allowed for roughly 40 % reduced look-up table usage

Digitale Bibliothek Thüringen