788 research outputs found

    Secure storage systems for untrusted cloud environments

    Get PDF
    The cloud has become established for applications that need to be scalable and highly available. However, moving data to data centers owned and operated by a third party, i.e., the cloud provider, raises security concerns because a cloud provider could easily access and manipulate the data or program flow, preventing the cloud from being used for certain applications, like medical or financial. Hardware vendors are addressing these concerns by developing Trusted Execution Environments (TEEs) that make the CPU state and parts of memory inaccessible from the host software. While TEEs protect the current execution state, they do not provide security guarantees for data which does not fit nor reside in the protected memory area, like network and persistent storage. In this work, we aim to address TEEs’ limitations in three different ways, first we provide the trust of TEEs to persistent storage, second we extend the trust to multiple nodes in a network, and third we propose a compiler-based solution for accessing heterogeneous memory regions. More specifically, • SPEICHER extends the trust provided by TEEs to persistent storage. SPEICHER implements a key-value interface. Its design is based on LSM data structures, but extends them to provide confidentiality, integrity, and freshness for the stored data. Thus, SPEICHER can prove to the client that the data has not been tampered with by an attacker. • AVOCADO is a distributed in-memory key-value store (KVS) that extends the trust that TEEs provide across the network to multiple nodes, allowing KVSs to scale beyond the boundaries of a single node. On each node, AVOCADO carefully divides data between trusted memory and untrusted host memory, to maximize the amount of data that can be stored on each node. AVOCADO leverages the fact that we can model network attacks as crash-faults to trust other nodes with a hardened ABD replication protocol. • TOAST is based on the observation that modern high-performance systems often use several different heterogeneous memory regions that are not easily distinguishable by the programmer. The number of regions is increased by the fact that TEEs divide memory into trusted and untrusted regions. TOAST is a compiler-based approach to unify access to different heterogeneous memory regions and provides programmability and portability. TOAST uses a load/store interface to abstract most library interfaces for different memory regions

    Digital agriculture: research, development and innovation in production chains.

    Get PDF
    Digital transformation in the field towards sustainable and smart agriculture. Digital agriculture: definitions and technologies. Agroenvironmental modeling and the digital transformation of agriculture. Geotechnologies in digital agriculture. Scientific computing in agriculture. Computer vision applied to agriculture. Technologies developed in precision agriculture. Information engineering: contributions to digital agriculture. DIPN: a dictionary of the internal proteins nanoenvironments and their potential for transformation into agricultural assets. Applications of bioinformatics in agriculture. Genomics applied to climate change: biotechnology for digital agriculture. Innovation ecosystem in agriculture: Embrapa?s evolution and contributions. The law related to the digitization of agriculture. Innovating communication in the age of digital agriculture. Driving forces for Brazilian agriculture in the next decade: implications for digital agriculture. Challenges, trends and opportunities in digital agriculture in Brazil

    Anpassen verteilter eingebetteter Anwendungen im laufenden Betrieb

    Get PDF
    The availability of third-party apps is among the key success factors for software ecosystems: The users benefit from more features and innovation speed, while third-party solution vendors can leverage the platform to create successful offerings. However, this requires a certain decoupling of engineering activities of the different parties not achieved for distributed control systems, yet. While late and dynamic integration of third-party components would be required, resulting control systems must provide high reliability regarding real-time requirements, which leads to integration complexity. Closing this gap would particularly contribute to the vision of software-defined manufacturing, where an ecosystem of modern IT-based control system components could lead to faster innovations due to their higher abstraction and availability of various frameworks. Therefore, this thesis addresses the research question: How we can use modern IT technologies and enable independent evolution and easy third-party integration of software components in distributed control systems, where deterministic end-to-end reactivity is required, and especially, how can we apply distributed changes to such systems consistently and reactively during operation? This thesis describes the challenges and related approaches in detail and points out that existing approaches do not fully address our research question. To tackle this gap, a formal specification of a runtime platform concept is presented in conjunction with a model-based engineering approach. The engineering approach decouples the engineering steps of component definition, integration, and deployment. The runtime platform supports this approach by isolating the components, while still offering predictable end-to-end real-time behavior. Independent evolution of software components is supported through a concept for synchronous reconfiguration during full operation, i.e., dynamic orchestration of components. Time-critical state transfer is supported, too, and can lead to bounded quality degradation, at most. The reconfiguration planning is supported by analysis concepts, including simulation of a formally specified system and reconfiguration, and analyzing potential quality degradation with the evolving dataflow graph (EDFG) method. A platform-specific realization of the concepts, the real-time container architecture, is described as a reference implementation. The model and the prototype are evaluated regarding their feasibility and applicability of the concepts by two case studies. The first case study is a minimalistic distributed control system used in different setups with different component variants and reconfiguration plans to compare the model and the prototype and to gather runtime statistics. The second case study is a smart factory showcase system with more challenging application components and interface technologies. The conclusion is that the concepts are feasible and applicable, even though the concepts and the prototype still need to be worked on in future -- for example, to reach shorter cycle times.Eine große Auswahl von Drittanbieter-Lösungen ist einer der Schlüsselfaktoren für Software Ecosystems: Nutzer profitieren vom breiten Angebot und schnellen Innovationen, während Drittanbieter über die Plattform erfolgreiche Lösungen anbieten können. Das jedoch setzt eine gewisse Entkopplung von Entwicklungsschritten der Beteiligten voraus, welche für verteilte Steuerungssysteme noch nicht erreicht wurde. Während Drittanbieter-Komponenten möglichst spät -- sogar Laufzeit -- integriert werden müssten, müssen Steuerungssysteme jedoch eine hohe Zuverlässigkeit gegenüber Echtzeitanforderungen aufweisen, was zu Integrationskomplexität führt. Dies zu lösen würde insbesondere zur Vision von Software-definierter Produktion beitragen, da ein Ecosystem für moderne IT-basierte Steuerungskomponenten wegen deren höherem Abstraktionsgrad und der Vielzahl verfügbarer Frameworks zu schnellerer Innovation führen würde. Daher behandelt diese Dissertation folgende Forschungsfrage: Wie können wir moderne IT-Technologien verwenden und unabhängige Entwicklung und einfache Integration von Software-Komponenten in verteilten Steuerungssystemen ermöglichen, wo Ende-zu-Ende-Echtzeitverhalten gefordert ist, und wie können wir insbesondere verteilte Änderungen an solchen Systemen konsistent und im Vollbetrieb vornehmen? Diese Dissertation beschreibt Herausforderungen und verwandte Ansätze im Detail und zeigt auf, dass existierende Ansätze diese Frage nicht vollständig behandeln. Um diese Lücke zu schließen, beschreiben wir eine formale Spezifikation einer Laufzeit-Plattform und einen zugehörigen Modell-basierten Engineering-Ansatz. Dieser Ansatz entkoppelt die Design-Schritte der Entwicklung, Integration und des Deployments von Komponenten. Die Laufzeit-Plattform unterstützt den Ansatz durch Isolation von Komponenten und zugleich Zeit-deterministischem Ende-zu-Ende-Verhalten. Unabhängige Entwicklung und Integration werden durch Konzepte für synchrone Rekonfiguration im Vollbetrieb unterstützt, also durch dynamische Orchestrierung. Dies beinhaltet auch Zeit-kritische Zustands-Transfers mit höchstens begrenzter Qualitätsminderung, wenn überhaupt. Rekonfigurationsplanung wird durch Analysekonzepte unterstützt, einschließlich der Simulation formal spezifizierter Systeme und Rekonfigurationen und der Analyse der etwaigen Qualitätsminderung mit dem Evolving Dataflow Graph (EDFG). Die Real-Time Container Architecture wird als Referenzimplementierung und Evaluationsplattform beschrieben. Zwei Fallstudien untersuchen Machbarkeit und Nützlichkeit der Konzepte. Die erste verwendet verschiedene Varianten und Rekonfigurationen eines minimalistischen verteilten Steuerungssystems, um Modell und Prototyp zu vergleichen sowie Laufzeitstatistiken zu erheben. Die zweite Fallstudie ist ein Smart-Factory-Demonstrator, welcher herausforderndere Applikationskomponenten und Schnittstellentechnologien verwendet. Die Konzepte sind den Studien nach machbar und nützlich, auch wenn sowohl die Konzepte als auch der Prototyp noch weitere Arbeit benötigen -- zum Beispiel, um kürzere Zyklen zu erreichen

    Resilient and Scalable Forwarding for Software-Defined Networks with P4-Programmable Switches

    Get PDF
    Traditional networking devices support only fixed features and limited configurability. Network softwarization leverages programmable software and hardware platforms to remove those limitations. In this context the concept of programmable data planes allows directly to program the packet processing pipeline of networking devices and create custom control plane algorithms. This flexibility enables the design of novel networking mechanisms where the status quo struggles to meet high demands of next-generation networks like 5G, Internet of Things, cloud computing, and industry 4.0. P4 is the most popular technology to implement programmable data planes. However, programmable data planes, and in particular, the P4 technology, emerged only recently. Thus, P4 support for some well-established networking concepts is still lacking and several issues remain unsolved due to the different characteristics of programmable data planes in comparison to traditional networking. The research of this thesis focuses on two open issues of programmable data planes. First, it develops resilient and efficient forwarding mechanisms for the P4 data plane as there are no satisfying state of the art best practices yet. Second, it enables BIER in high-performance P4 data planes. BIER is a novel, scalable, and efficient transport mechanism for IP multicast traffic which has only very limited support of high-performance forwarding platforms yet. The main results of this thesis are published as 8 peer-reviewed and one post-publication peer-reviewed publication. The results cover the development of suitable resilience mechanisms for P4 data planes, the development and implementation of resilient BIER forwarding in P4, and the extensive evaluations of all developed and implemented mechanisms. Furthermore, the results contain a comprehensive P4 literature study. Two more peer-reviewed papers contain additional content that is not directly related to the main results. They implement congestion avoidance mechanisms in P4 and develop a scheduling concept to find cost-optimized load schedules based on day-ahead forecasts

    SoC-based FPGA architecture for image analysis and other highly demanding applications

    Get PDF
    Al giorno d'oggi, lo sviluppo di algoritmi si concentra su calcoli efficienti in termini di prestazioni ed efficienza energetica. Tecnologie come il field programmable gate array (FPGA) e il system on chip (SoC) basato su FPGA (FPGA/SoC) hanno dimostrato la loro capacità di accelerare applicazioni di calcolo intensive risparmiando al contempo il consumo energetico, grazie alla loro capacità di elevato parallelismo e riconfigurazione dell'architettura. Attualmente, i cicli di progettazione esistenti per FPGA/SoC sono lunghi, a causa della complessità dell'architettura. Pertanto, per colmare il divario tra le applicazioni e le architetture FPGA/SoC e ottenere un design hardware efficiente per l'analisi delle immagini e altri applicazioni altamente demandanti utilizzando lo strumento di sintesi di alto livello, vengono prese in considerazione due strategie complementari: tecniche ad hoc e stima delle prestazioni. Per quanto riguarda le tecniche ad-hoc, tre applicazioni molto impegnative sono state accelerate attraverso gli strumenti HLS: discriminatore di forme di impulso per i raggi cosmici, classificazione automatica degli insetti e re-ranking per il recupero delle informazioni, sottolineando i vantaggi quando questo tipo di applicazioni viene attraversato da tecniche di compressione durante il targeting dispositivi FPGA/SoC. Inoltre, in questa tesi viene proposto uno stimatore delle prestazioni per l'accelerazione hardware per prevedere efficacemente l'utilizzo delle risorse e la latenza per FPGA/SoC, costruendo un ponte tra l'applicazione e i domini architetturali. Lo strumento integra modelli analitici per la previsione delle prestazioni e un motore design space explorer (DSE) per fornire approfondimenti di alto livello agli sviluppatori di hardware, composto da due motori indipendenti: DSE basato sull'ottimizzazione a singolo obiettivo e DSE basato sull'ottimizzazione evolutiva multiobiettivo.Nowadays, the development of algorithms focuses on performance-efficient and energy-efficient computations. Technologies such as field programmable gate array (FPGA) and system on chip (SoC) based on FPGA (FPGA/SoC) have shown their ability to accelerate intensive computing applications while saving power consumption, owing to their capability of high parallelism and reconfiguration of the architecture. Currently, the existing design cycles for FPGA/SoC are time-consuming, owing to the complexity of the architecture. Therefore, to address the gap between applications and FPGA/SoC architectures and to obtain an efficient hardware design for image analysis and highly demanding applications using the high-level synthesis tool, two complementary strategies are considered: ad-hoc techniques and performance estimator. Regarding ad-hoc techniques, three highly demanding applications were accelerated through HLS tools: pulse shape discriminator for cosmic rays, automatic pest classification, and re-ranking for information retrieval, emphasizing the benefits when this type of applications are traversed by compression techniques when targeting FPGA/SoC devices. Furthermore, a comprehensive performance estimator for hardware acceleration is proposed in this thesis to effectively predict the resource utilization and latency for FPGA/SoC, building a bridge between the application and architectural domains. The tool integrates analytical models for performance prediction, and a design space explorer (DSE) engine for providing high-level insights to hardware developers, composed of two independent sub-engines: DSE based on single-objective optimization and DSE based on evolutionary multi-objective optimization

    Digital agriculture: research, development and innovation in production chains.

    Get PDF
    Digital transformation in the field towards sustainable and smart agriculture. Digital agriculture: definitions and technologies. Agroenvironmental modeling and the digital transformation of agriculture. Geotechnologies in digital agriculture. Scientific computing in agriculture. Computer vision applied to agriculture. Technologies developed in precision agriculture. Information engineering: contributions to digital agriculture. DIPN: a dictionary of the internal proteins nanoenvironments and their potential for transformation into agricultural assets. Applications of bioinformatics in agriculture. Genomics applied to climate change: biotechnology for digital agriculture. Innovation ecosystem in agriculture: Embrapa?s evolution and contributions. The law related to the digitization of agriculture. Innovating communication in the age of digital agriculture. Driving forces for Brazilian agriculture in the next decade: implications for digital agriculture. Challenges, trends and opportunities in digital agriculture in Brazil.Translated by Beverly Victoria Young and Karl Stephan Mokross

    On benchmarking of deep learning systems: software engineering issues and reproducibility challenges

    Get PDF
    Since AlexNet won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012, Deep Learning (and Machine Learning/AI in general) gained an exponential interest. Nowadays, their adoption spreads over numerous sectors, like automotive, robotics, healthcare and finance. The ML advancement goes in pair with the quality improvement delivered by those solutions. However, those ameliorations are not for free: ML algorithms always require an increasing computational power, which pushes computer engineers to develop new devices capable of coping with this demand for performance. To foster the evolution of DSAs, and thus ML research, it is key to make it easy to experiment and compare them. This may be challenging since, even if the software built around these devices simplifies their usage, obtaining the best performance is not always straightforward. The situation gets even worse when the experiments are not conducted in a reproducible way. Even though the importance of reproducibility for the research is evident, it does not directly translate into reproducible experiments. In fact, as already shown by previous studies regarding other research fields, also ML is facing a reproducibility crisis. Our work addresses the topic of reproducibility of ML applications. Reproducibility in this context has two aspects: results reproducibility and performance reproducibility. While the reproducibility of the results is mandatory, performance reproducibility cannot be neglected because high-performance device usage causes cost. To understand how the ML situation is regarding reproducibility of performance, we reproduce results published for the MLPerf suite, which seems to be the most used machine learning benchmark. Because of the wide range of devices and frameworks used in different benchmark submissions, we focus on a subset of accuracy and performance results submitted to the MLPerf Inference benchmark, presenting a detailed analysis of the difficulties a scientist may find when trying to reproduce such a benchmark and a possible solution using our workflow tool for experiment reproducibility: PROVA!. We designed PROVA! to support the reproducibility in traditional HPC experiments, but we will show how we extended it to be used as a 'driver' for MLPerf benchmark applications. The PROVA! driver mode allows us to experiment with different versions of the MLPerf Inference benchmark switching among different hardware and software combinations and compare them in a reproducible way. In the last part, we will present the results of our reproducibility study, demonstrating the importance of having a support tool to reproduce and extend original experiments getting deeper knowledge about performance behaviours

    Towards Efficient Resource Allocation for Embedded Systems

    Get PDF
    Das Hauptthema ist die dynamische Ressourcenverwaltung in eingebetteten Systemen, insbesondere die Verwaltung von Rechenzeit und Netzwerkverkehr auf einem MPSoC. Die Idee besteht darin, eine Pipeline für die Verarbeitung von Mobiler Kommunikation auf dem Chip dynamisch zu schedulen, um die Effizienz der Hardwareressourcen zu verbessern, ohne den Ressourcenverbrauch des dynamischen Schedulings dramatisch zu erhöhen. Sowohl Software- als auch Hardwaremodule werden auf Hotspots im Ressourcenverbrauch untersucht und optimiert, um diese zu entfernen. Da Applikationen im Bereich der Signalverarbeitung normalerweise mit Hilfe von SDF-Diagrammen beschrieben werden können, wird deren dynamisches Scheduling optimiert, um den Ressourcenverbrauch gegenüber dem üblicherweise verwendeten statischen Scheduling zu verbessern. Es wird ein hybrider dynamischer Scheduler vorgestellt, der die Vorteile von Processing-Networks und der Planung von Task-Graphen kombiniert. Es ermöglicht dem Scheduler, ein Gleichgewicht zwischen der Parallelisierung der Berechnung und der Zunahme des dynamischen Scheduling-Aufands optimal abzuwägen. Der resultierende dynamisch erstellte Schedule reduziert den Ressourcenverbrauch um etwa 50%, wobei die Laufzeit im Vergleich zu einem statischen Schedule nur um 20% erhöht wird. Zusätzlich wird ein verteilter dynamischer SDF-Scheduler vorgeschlagen, der das Scheduling in verschiedene Teile zerlegt, die dann zu einer Pipeline verbunden werden, um mehrere parallele Prozessoren einzubeziehen. Jeder Scheduling-Teil wird zu einem Cluster mit Load-Balancing erweitert, um die Anzahl der parallel laufenden Scheduling-Jobs weiter zu erhöhen. Auf diese Weise wird dem vorhandene Engpass bei dem dynamischen Scheduling eines zentralisierten Schedulers entgegengewirkt, sodass 7x mehr Prozessoren mit dem Pipelined-Clustered-Dynamic-Scheduler für eine typische Signalverarbeitungsanwendung verwendet werden können. Das neue dynamische Scheduling-System setzt das Vorhandensein von drei verschiedenen Kommunikationsmodi zwischen den Verarbeitungskernen voraus. Bei der Emulation auf Basis des häufig verwendeten RDMA-Protokolls treten Leistungsprobleme auf. Sehr gut kann RDMA für einmalige Punkt-zu-Punkt-Datenübertragungen verwendet werden, wie sie bei der Ausführung von Task-Graphen verwendet werden. Process-Networks verwenden normalerweise Datenströme mit hohem Volumen und hoher Bandbreite. Es wird eine FIFO-basierte Kommunikationslösung vorgestellt, die einen zyklischen Puffer sowohl im Sender als auch im Empfänger implementiert, um diesen Bedarf zu decken. Die Pufferbehandlung und die Datenübertragung zwischen ihnen erfolgen ausschließlich in Hardware, um den Software-Overhead aus der Anwendung zu entfernen. Die Implementierung verbessert die Zugriffsverwaltung mehrerer Nutzer auf flächen-effiziente Single-Port Speichermodule. Es werden 0,8 der theoretisch möglichen Bandbreite, die normalerweise nur mit flächenmäßig teureren Dual-Port-Speichern erreicht wird. Der dritte Kommunikationsmodus definiert eine einfache Message-Passing-Implementierung, die ohne einen Verbindungszustand auskommt. Dieser Modus wird für eine effiziente prozessübergreifende Kommunikation des verteilten Scheduling-Systems und der engen Ansteuerung der restlichen Prozessoren benötigt. Eine Flusskontrolle in Hardware stellt sicher, dass eine große Anzahl von Sendern Nachrichten an denselben Empfänger senden kann. Dabei wird garantiert, dass alle Nachrichten korrekt empfangen werden, ohne dass eine Verbindung hergestellt werden muss und die Nachrichtenlaufzeit gering bleibt. Die Arbeit konzentriert sich auf die Optimierung des Codesigns von Hardware und Software, um die kompromisslose Ressourceneffizienz der dynamischen SDF-Graphen-Planung zu erhöhen. Besonderes Augenmerk wird auf die Abhängigkeiten zwischen den Ebenen eines verteilten Scheduling-Systems gelegt, das auf der Verfügbarkeit spezifischer hardwarebeschleunigter Kommunikationsmethoden beruht.:1 Introduction 1.1 Motivation 1.2 The Multiprocessor System on Chip Architecture 1.3 Concrete MPSoC Architecture 1.4 Representing LTE/5G baseband processing as Static Data Flow 1.5 Compuation Stack 1.6 Performance Hotspots Addressed 1.7 State of the Art 1.8 Overview of the Work 2 Hybrid SDF Execution 2.1 Addressed Performance Hotspot 2.2 State of the Art 2.3 Static Data Flow Graphs 2.4 Runtime Environment 2.5 Overhead of Deloying Tasks to a MPSoC 2.6 Interpretation of SDF Graphs as Task Graphs 2.7 Interpreting SDF Graphs as Process Networks 2.8 Hybrid Interpretation 2.9 Graph Topology Considerations 2.10 Theoretic Impact of Hybrid Interpretation 2.11 Simulating Hybrid Execution 2.12 Pipeline SDF Graph Example 2.13 Random SDF Graphs 2.14 LTE-like SDF Graph 2.15 Key Lernings 3 Distribution of Management 3.1 Addressed Performance Hotspot 3.2 State of the Art 3.3 Revising Deployment Overhead 3.4 Distribution of Overhead 3.5 Impact of Management Distribution to Resource Utilization 3.6 Reconfigurability 3.7 Key Lernings 4 Sliced FIFO Hardware 4.1 Addressed Performance Hotspot 4.2 State of the Art 4.3 System Environment 4.4 Sliced Windowed FIFO buffer 4.5 Single FIFO Evaluation 4.6 Multiple FIFO Evalutaion 4.7 Hardware Implementation 4.8 Key Lernings 5 Message Passing Hardware 5.1 Addressed Performance Hotspot 5.2 State of the Art 5.3 Message Passing Regarded as Queueing 5.4 A Remote Direct Memory Access Based Implementation 5.5 Hardware Implementation Concept 5.6 Evalutation of Performance 5.7 Key Lernings 6 SummaryThe main topic is the dynamic resource allocation in embedded systems, especially the allocation of computing time and network traffic on an multi processor system on chip (MPSoC). The idea is to dynamically schedule a mobile communication signal processing pipeline on the chip to improve hardware resource efficiency while not dramatically improve resource consumption because of dynamic scheduling overhead. Both software and hardware modules are examined for resource consumption hotspots and optimized to remove them. Since signal processing can usually be described with the help of static data flow (SDF) graphs, the dynamic handling of those is optimized to improve resource consumption over the commonly used static scheduling approach. A hybrid dynamic scheduler is presented that combines benefits from both processing networks and task graph scheduling. It allows the scheduler to optimally balance parallelization of computation and addition of dynamic scheduling overhead. The resulting dynamically created schedule reduces resource consumption by about 50%, with a runtime increase of only 20% compared to a static schedule. Additionally, a distributed dynamic SDF scheduler is proposed that splits the scheduling into different parts, which are then connected to a scheduling pipeli ne to incorporate multiple parallel working processors. Each scheduling stage is reworked into a load-balanced cluster to increase the number of parallel scheduling jobs further. This way, the still existing dynamic scheduling bottleneck of a centralized scheduler is widened, allowing handling 7x more processors with the pipelined, clustered dynamic scheduler for a typical signal processing application. The presented dynamic scheduling system assumes the presence of three different communication modes between the processing cores. When emulated on top of the commonly used remote direct memory access (RDMA) protocol, performance issues are encountered. Firstly, RDMA can neatly be used for single-shot point-to-point data transfers, like used in task graph scheduling. Process networks usually make use of high-volume and high-bandwidth data streams. A first in first out (FIFO) communication solution is presented that implements a cyclic buffer on both sender and receiver to serve this need. The buffer handling and data transfer between them are done purely in hardware to remove software overhead from the application. The implementation improves the multi-user access to area-efficient single port on-chip memory modules. It achieves 0.8 of the theoretically possible bandwidth, usually only achieved with area expensive dual-port memories. The third communication mode defines a lightweight message passing (MP) implementation that is truly connectionless. It is needed for efficient inter-process communication of the distributed and clustered scheduling system and the worker processing units’ tight coupling. A hardware flow control assures that an arbitrary number of senders can spontaneously start sending messages to the same receiver. Yet, all messages are guaranteed to be correctly received while eliminating the need for connection establishment and keeping a low message delay. The work focuses on the hardware-software codesign optimization to increase the uncompromised resource efficiency of dynamic SDF graph scheduling. Special attention is paid to the inter-level dependencies in developing a distributed scheduling system, which relies on the availability of specific hardwareaccelerated communication methods.:1 Introduction 1.1 Motivation 1.2 The Multiprocessor System on Chip Architecture 1.3 Concrete MPSoC Architecture 1.4 Representing LTE/5G baseband processing as Static Data Flow 1.5 Compuation Stack 1.6 Performance Hotspots Addressed 1.7 State of the Art 1.8 Overview of the Work 2 Hybrid SDF Execution 2.1 Addressed Performance Hotspot 2.2 State of the Art 2.3 Static Data Flow Graphs 2.4 Runtime Environment 2.5 Overhead of Deloying Tasks to a MPSoC 2.6 Interpretation of SDF Graphs as Task Graphs 2.7 Interpreting SDF Graphs as Process Networks 2.8 Hybrid Interpretation 2.9 Graph Topology Considerations 2.10 Theoretic Impact of Hybrid Interpretation 2.11 Simulating Hybrid Execution 2.12 Pipeline SDF Graph Example 2.13 Random SDF Graphs 2.14 LTE-like SDF Graph 2.15 Key Lernings 3 Distribution of Management 3.1 Addressed Performance Hotspot 3.2 State of the Art 3.3 Revising Deployment Overhead 3.4 Distribution of Overhead 3.5 Impact of Management Distribution to Resource Utilization 3.6 Reconfigurability 3.7 Key Lernings 4 Sliced FIFO Hardware 4.1 Addressed Performance Hotspot 4.2 State of the Art 4.3 System Environment 4.4 Sliced Windowed FIFO buffer 4.5 Single FIFO Evaluation 4.6 Multiple FIFO Evalutaion 4.7 Hardware Implementation 4.8 Key Lernings 5 Message Passing Hardware 5.1 Addressed Performance Hotspot 5.2 State of the Art 5.3 Message Passing Regarded as Queueing 5.4 A Remote Direct Memory Access Based Implementation 5.5 Hardware Implementation Concept 5.6 Evalutation of Performance 5.7 Key Lernings 6 Summar
    corecore