228 research outputs found
Design and characterisation of monolithic CMOS detectors for high energy particle physics and SEU radiation tests for ATLAS Inner Tracker Upgrade readout chip
This thesis covers the characterisation results and the design of monolithic CMOS detectors designed in TowerJazz 180nm CMOS technology for High Energy Particle Physics applications. Three different detectors have been studied the MALTA, the Mini-MALTA and the MALTA2. The MALTA sensor showed some efficiency losses at the corners of the pixels after irradiation, which meant that it was not suitable for the radiation environments in which it was supposed to be installed. Therefore, the front-end electronics and the fabrication process were modified to overcome this issue. The Mini-MALTA prototype was designed including the above mentioned improvements, fabricated and fully characterised. Finally taking into account all the knowledge acquired during these years of developments another large scale sensor the MALTA2 has been produced which should be radiation tolerant and have very good time resolution. The description and studies of the different architectures used in this family of detectors are covered and a simulation to estimate the bandwidth capabilities have been reported.
Furthermore, this work will present characterisation of single event effects in the ITkPixV1, the prototype version of the ATLAS Inner Tracker Upgrade chip for the High Luminosity LHC. Measurements were made in testbeam campaigns with high energy ions and protons to evaluate the level of single event effects in the chip
Fictional Practices of Spirituality I: Interactive Media
"Fictional Practices of Spirituality" provides critical insight into the implementation of belief, mysticism, religion, and spirituality into worlds of fiction, be it interactive or non-interactive. This first volume focuses on interactive, virtual worlds - may that be the digital realms of video games and VR applications or the imaginary spaces of life action role-playing and soul-searching practices. It features analyses of spirituality as gameplay facilitator, sacred spaces and architecture in video game geography, religion in video games and spiritual acts and their dramaturgic function in video games, tabletop, or LARP, among other topics. The contributors offer a first-time ever comprehensive overview of play-rites as spiritual incentives and playful spirituality in various medial incarnations
Towards Efficient Resource Allocation for Embedded Systems
Das Hauptthema ist die dynamische Ressourcenverwaltung in eingebetteten Systemen, insbesondere die Verwaltung von Rechenzeit und Netzwerkverkehr auf einem MPSoC. Die Idee besteht darin, eine Pipeline fĂŒr die Verarbeitung von Mobiler Kommunikation auf dem Chip dynamisch zu schedulen, um die Effizienz der Hardwareressourcen zu verbessern, ohne den Ressourcenverbrauch des dynamischen Schedulings dramatisch zu erhöhen. Sowohl Software- als auch Hardwaremodule werden auf Hotspots im Ressourcenverbrauch untersucht und optimiert, um diese zu entfernen. Da Applikationen im Bereich der Signalverarbeitung normalerweise mit Hilfe von SDF-Diagrammen beschrieben werden können, wird deren dynamisches Scheduling optimiert, um den Ressourcenverbrauch gegenĂŒber dem ĂŒblicherweise verwendeten statischen Scheduling zu verbessern. Es wird ein hybrider dynamischer Scheduler vorgestellt, der die Vorteile von Processing-Networks und der Planung von Task-Graphen kombiniert. Es ermöglicht dem Scheduler, ein Gleichgewicht zwischen der Parallelisierung der Berechnung und der Zunahme des dynamischen Scheduling-Aufands optimal abzuwĂ€gen. Der resultierende dynamisch erstellte Schedule reduziert den Ressourcenverbrauch um etwa 50%, wobei die Laufzeit im Vergleich zu einem statischen Schedule nur um 20% erhöht wird. ZusĂ€tzlich wird ein verteilter dynamischer SDF-Scheduler vorgeschlagen, der das Scheduling in verschiedene Teile zerlegt, die dann zu einer Pipeline verbunden werden, um mehrere parallele Prozessoren einzubeziehen. Jeder Scheduling-Teil wird zu einem Cluster mit Load-Balancing erweitert, um die Anzahl der parallel laufenden Scheduling-Jobs weiter zu erhöhen. Auf diese Weise wird dem vorhandene Engpass bei dem dynamischen Scheduling eines zentralisierten Schedulers entgegengewirkt, sodass 7x mehr Prozessoren mit dem Pipelined-Clustered-Dynamic-Scheduler fĂŒr eine typische Signalverarbeitungsanwendung verwendet werden können.
Das neue dynamische Scheduling-System setzt das Vorhandensein von drei verschiedenen Kommunikationsmodi zwischen den Verarbeitungskernen voraus. Bei der Emulation auf Basis des hĂ€ufig verwendeten RDMA-Protokolls treten Leistungsprobleme auf. Sehr gut kann RDMA fĂŒr einmalige Punkt-zu-Punkt-DatenĂŒbertragungen verwendet werden, wie sie bei der AusfĂŒhrung von Task-Graphen verwendet werden. Process-Networks verwenden normalerweise Datenströme mit hohem Volumen und hoher Bandbreite. Es wird eine FIFO-basierte Kommunikationslösung vorgestellt, die einen zyklischen Puffer sowohl im Sender als auch im EmpfĂ€nger implementiert, um diesen Bedarf zu decken. Die Pufferbehandlung und die DatenĂŒbertragung zwischen ihnen erfolgen ausschlieĂlich in Hardware, um den Software-Overhead aus der Anwendung zu entfernen. Die Implementierung verbessert die Zugriffsverwaltung mehrerer Nutzer auf flĂ€chen-effiziente Single-Port Speichermodule. Es werden 0,8 der theoretisch möglichen Bandbreite, die normalerweise nur mit flĂ€chenmĂ€Ăig teureren Dual-Port-Speichern erreicht wird. Der dritte Kommunikationsmodus definiert eine einfache Message-Passing-Implementierung, die ohne einen Verbindungszustand auskommt. Dieser Modus wird fĂŒr eine effiziente prozessĂŒbergreifende Kommunikation des verteilten Scheduling-Systems und der engen Ansteuerung der restlichen Prozessoren benötigt. Eine Flusskontrolle in Hardware stellt sicher, dass eine groĂe Anzahl von Sendern Nachrichten an denselben EmpfĂ€nger senden kann. Dabei wird garantiert, dass alle Nachrichten korrekt empfangen werden, ohne dass eine Verbindung hergestellt werden muss und die Nachrichtenlaufzeit gering bleibt.
Die Arbeit konzentriert sich auf die Optimierung des Codesigns von Hardware und Software, um die kompromisslose Ressourceneffizienz der dynamischen SDF-Graphen-Planung zu erhöhen. Besonderes Augenmerk wird auf die AbhĂ€ngigkeiten zwischen den Ebenen eines verteilten Scheduling-Systems gelegt, das auf der VerfĂŒgbarkeit spezifischer hardwarebeschleunigter Kommunikationsmethoden beruht.:1 Introduction
1.1 Motivation
1.2 The Multiprocessor System on Chip Architecture
1.3 Concrete MPSoC Architecture
1.4 Representing LTE/5G baseband processing as Static Data Flow
1.5 Compuation Stack
1.6 Performance Hotspots Addressed
1.7 State of the Art
1.8 Overview of the Work
2 Hybrid SDF Execution
2.1 Addressed Performance Hotspot
2.2 State of the Art
2.3 Static Data Flow Graphs
2.4 Runtime Environment
2.5 Overhead of Deloying Tasks to a MPSoC
2.6 Interpretation of SDF Graphs as Task Graphs
2.7 Interpreting SDF Graphs as Process Networks
2.8 Hybrid Interpretation
2.9 Graph Topology Considerations
2.10 Theoretic Impact of Hybrid Interpretation
2.11 Simulating Hybrid Execution
2.12 Pipeline SDF Graph Example
2.13 Random SDF Graphs
2.14 LTE-like SDF Graph
2.15 Key Lernings
3 Distribution of Management
3.1 Addressed Performance Hotspot
3.2 State of the Art
3.3 Revising Deployment Overhead
3.4 Distribution of Overhead
3.5 Impact of Management Distribution to Resource Utilization
3.6 Reconfigurability
3.7 Key Lernings
4 Sliced FIFO Hardware
4.1 Addressed Performance Hotspot
4.2 State of the Art
4.3 System Environment
4.4 Sliced Windowed FIFO buffer
4.5 Single FIFO Evaluation
4.6 Multiple FIFO Evalutaion
4.7 Hardware Implementation
4.8 Key Lernings
5 Message Passing Hardware
5.1 Addressed Performance Hotspot
5.2 State of the Art
5.3 Message Passing Regarded as Queueing
5.4 A Remote Direct Memory Access Based Implementation
5.5 Hardware Implementation Concept
5.6 Evalutation of Performance
5.7 Key Lernings
6 SummaryThe main topic is the dynamic resource allocation in embedded systems, especially the allocation of computing time and network traïŹc on an multi processor system on chip (MPSoC). The idea is to dynamically schedule a mobile communication signal processing pipeline on the chip to improve hardware resource eïŹciency while not dramatically improve resource consumption because of dynamic scheduling overhead. Both software and hardware modules are examined for resource consumption hotspots and optimized to remove them. Since signal processing can usually be described with the help of static data ïŹow (SDF) graphs, the dynamic handling of those is optimized to improve resource consumption over the commonly used static scheduling approach. A hybrid dynamic scheduler is presented that combines beneïŹts from both processing networks and task graph scheduling. It allows the scheduler to optimally balance parallelization of computation and addition of dynamic scheduling overhead. The resulting dynamically created schedule reduces resource consumption by about 50%, with a runtime increase of only 20% compared to a static schedule. Additionally, a distributed dynamic SDF scheduler is proposed that splits the scheduling into different parts, which are then connected to a scheduling pipeli ne to incorporate multiple parallel working processors. Each scheduling stage is reworked into a load-balanced cluster to increase the number of parallel scheduling jobs further. This way, the still existing dynamic scheduling bottleneck of a centralized scheduler is widened, allowing handling 7x more processors with the pipelined, clustered dynamic scheduler for a typical signal processing application.
The presented dynamic scheduling system assumes the presence of three different communication modes between the processing cores. When emulated on top of the commonly used remote direct memory access (RDMA) protocol, performance issues are encountered. Firstly, RDMA can neatly be used for single-shot point-to-point data transfers, like used in task graph scheduling. Process networks usually make use of high-volume and high-bandwidth data streams. A ïŹrst in ïŹrst out (FIFO) communication solution is presented that implements a cyclic buffer on both sender and receiver to serve this need. The buffer handling and data transfer between them are done purely in hardware to remove software overhead from the application. The implementation improves the multi-user access to area-eïŹcient single port on-chip memory modules. It achieves 0.8 of the theoretically possible bandwidth, usually only achieved with area expensive dual-port memories. The third communication mode deïŹnes a lightweight message passing (MP) implementation that is truly connectionless. It is needed for eïŹcient inter-process communication of the distributed and clustered scheduling system and the worker processing unitsâ tight coupling. A hardware ïŹow control assures that an arbitrary number of senders can spontaneously start sending messages to the same receiver. Yet, all messages are guaranteed to be correctly received while eliminating the need for connection establishment and keeping a low message delay.
The work focuses on the hardware-software codesign optimization to increase the uncompromised resource eïŹciency of dynamic SDF graph scheduling. Special attention is paid to the inter-level dependencies in developing a distributed scheduling system, which relies on the availability of speciïŹc hardwareaccelerated communication methods.:1 Introduction
1.1 Motivation
1.2 The Multiprocessor System on Chip Architecture
1.3 Concrete MPSoC Architecture
1.4 Representing LTE/5G baseband processing as Static Data Flow
1.5 Compuation Stack
1.6 Performance Hotspots Addressed
1.7 State of the Art
1.8 Overview of the Work
2 Hybrid SDF Execution
2.1 Addressed Performance Hotspot
2.2 State of the Art
2.3 Static Data Flow Graphs
2.4 Runtime Environment
2.5 Overhead of Deloying Tasks to a MPSoC
2.6 Interpretation of SDF Graphs as Task Graphs
2.7 Interpreting SDF Graphs as Process Networks
2.8 Hybrid Interpretation
2.9 Graph Topology Considerations
2.10 Theoretic Impact of Hybrid Interpretation
2.11 Simulating Hybrid Execution
2.12 Pipeline SDF Graph Example
2.13 Random SDF Graphs
2.14 LTE-like SDF Graph
2.15 Key Lernings
3 Distribution of Management
3.1 Addressed Performance Hotspot
3.2 State of the Art
3.3 Revising Deployment Overhead
3.4 Distribution of Overhead
3.5 Impact of Management Distribution to Resource Utilization
3.6 Reconfigurability
3.7 Key Lernings
4 Sliced FIFO Hardware
4.1 Addressed Performance Hotspot
4.2 State of the Art
4.3 System Environment
4.4 Sliced Windowed FIFO buffer
4.5 Single FIFO Evaluation
4.6 Multiple FIFO Evalutaion
4.7 Hardware Implementation
4.8 Key Lernings
5 Message Passing Hardware
5.1 Addressed Performance Hotspot
5.2 State of the Art
5.3 Message Passing Regarded as Queueing
5.4 A Remote Direct Memory Access Based Implementation
5.5 Hardware Implementation Concept
5.6 Evalutation of Performance
5.7 Key Lernings
6 Summar
A Modular Platform for Adaptive Heterogeneous Many-Core Architectures
Multi-/many-core heterogeneous architectures are shaping current and upcoming generations of compute-centric platforms which are widely used starting from mobile and wearable devices to high-performance cloud computing servers. Heterogeneous many-core architectures sought to achieve an order of magnitude higher energy efficiency as well as computing performance scaling by replacing homogeneous and power-hungry general-purpose processors with multiple heterogeneous compute units supporting multiple core types and domain-specific accelerators. Drifting from homogeneous architectures to complex heterogeneous systems is heavily adopted by chip designers and the silicon industry for more than a decade. Recent silicon chips are based on a heterogeneous SoC which combines a scalable number of heterogeneous processing units from different types (e.g. CPU, GPU, custom
accelerator).
This shifting in computing paradigm is associated with several system-level design challenges related to the integration and communication between a highly scalable number of heterogeneous compute units as well as SoC peripherals and storage units. Moreover, the increasing design complexities make the production of heterogeneous SoC chips a monopoly for only big market players due to the increasing development and design costs. Accordingly, recent initiatives towards agile hardware development open-source tools and microarchitecture aim to democratize silicon chip production for academic and commercial usage.
Agile hardware development aims to reduce development costs by providing an ecosystem for open-source hardware microarchitectures and hardware design processes. Therefore, heterogeneous many-core development and customization will be relatively less complex and less time-consuming than conventional design process methods.
In order to provide a modular and agile many-core development approach, this dissertation proposes a development platform for heterogeneous and self-adaptive many-core architectures consisting of a scalable number of heterogeneous tiles that maintain design regularity features while supporting heterogeneity. The proposed platform hides the integration complexities
by supporting modular tile architectures for general-purpose processing cores
supporting multi-instruction set architectures (multi-ISAs) and custom hardware accelerators. By leveraging field-programmable-gate-arrays (FPGAs), the self-adaptive feature of the many-core platform can be achieved by using dynamic and partial reconfiguration (DPR) techniques.
This dissertation realizes the proposed modular and adaptive heterogeneous many-core platform through three main contributions. The first contribution proposes and realizes a many-core architecture for heterogeneous ISAs. It provides a modular and reusable tilebased architecture for several heterogeneous ISAs based on open-source RISC-V ISA. The modular tile-based architecture features a configurable number of processing cores with different RISC-V ISAs and different memory hierarchies.
To increase the level of heterogeneity to support the integration of custom hardware accelerators, a novel hybrid memory/accelerator tile architecture is developed and realized as the second contribution. The hybrid tile is a modular and reusable tile that can be configured at run-time to operate as a scratchpad shared memory between compute tiles or as an accelerator tile hosting a local hardware accelerator logic. The hybrid tile is designed and implemented to be seamlessly integrated into the proposed tile-based platform.
The third contribution deals with the self-adaptation features by providing a reconfiguration management approach to internally control the DPR process through processing cores (RISC-V based). The internal reconfiguration process relies on a novel DPR controller targeting FPGA design flow for RISC-V-based SoC to change the types and functionalities of compute tiles at run-time
Load Based Dynamic Priority Arbiter for NoC Architecture
495-504The evolution of Very Large Scale Integration (VLSI) and the semiconductor industry have led to the focus on multicore
architectures. Network on Chip (NoC) is one of such arrangement which is an interconnection framework comprised of
cores, routers, and links. The output port for each request from the input port must be computed, and the output channel
must be reserved for the next router. However, the same output port can be requested by more than one input port, but only
one request can be granted at a time. Multiple requests for a single output channel will lead to congestion of the packets,
thereby increasing the network latency and leading to packet losses. The arbiter selects any one of the input ports and grants
permission to use the requested output port while putting the other input port requests to wait. For a congestion-free traversal
of packets and to avoid dropping of packets, a Load based Dynamic Priority Arbiter (LDPA) with dynamically changing
priorities during run time based on the input port load has been proposed. The proposed customized arbiter LDPA works
based on the updates made in the reservations of each input port. The priority of each input port is given according to the
average load. More weight is allotted to the highly loaded input ports. By randomization, the chance is given to the lower
priority input ports to reduce starvation and hence latency. With the use of the proposed LDPA, the average network latency
is reduced by about 15.98% when compared to that of baseline FIFO arbiter, without any compromise in power and
throughput
CarRing IV- Real-time Computer Network
Ob in der Automobil-, Avionik- oder Automatisierungstechnik, die Fortschritte in der
Echtzeitkommunikation richten sich auf weitere Verbesserungen bereits existierender
Lösungen. Im Kfz-Bereich fĂŒhren die steigenden Zahlen computerbasierter Systeme,
Anwendungen und AnschlĂŒsse sowie die Verwendung mehrerer proprietĂ€rer Kommunikationsstandards zu einem immer komplexeren Kabelbaum. UrsĂ€chlich hierfĂŒr sind
inkompatible Standards, wodurch nicht nur die Kosten, sondern auch das Gewicht
und damit der Kraftstoffverbrauch negativ beeinflusst werden.
Im ersten Teil der Dissertation wird das Echtzeitprotokoll von CarRing IV (CRIV) vorgestellt. Es bietet isochrone und harte Echtzeitgarantien, ohne dass eine netzwerkweite Synchronisation erforderlich ist. Mit bis zu 16 Knoten pro Ring kann
ein CR-IV-Netz aus bis zu 256 Ringen bestehen, die durch Router miteinander verbunden sind. CR-IV verwendet ein reduziertes OSI-Modell (Schichten 1-3, 7), das
fĂŒr seine Anwendungsbereiche sowohl typisch als auch vorteilhaft ist. AuĂerdem
unterstĂŒtzt es sowohl ereignis- als auch zeitgesteuerte Kommunikationsparadigmen.
Der Transparent-Modus ermöglicht es CR-IV, als Backbone fĂŒr bestehende Netze
zu verwenden, wodurch InkompatibilitÀtsprobleme beseitigt werden und der Wechsel zu einer einheitlicheren Netzlösung erleichtert wird. Mit dieser FunktionalitÀt
können NutzergerĂ€te ĂŒber ein CR-IV-Netz miteinander verbunden werden, ohne dass
der Nutzer eingreifen oder etwas Ă€ndern muss. Durch Multicast unterstĂŒtzt CRIV auch die Emulation von Feldbussen. Der zweite Teil der Dissertation stellt den
anderen wichtigen Aspekt von CR-IV vor. Alle Schichten des OSI-Modells sind in
einem FPGA mit Hardware Description Languages (HDLs) ohne Hard- oder Softprozessoren implementiert. Das Register-Transfer-Level (RTL)-Hardwaredesign von
CR-IV wird mit einem neuen Ansatz erstellt, der am besten als tokenbasierter Datenfluss beschrieben werden kann. Der Ansatz ist sowohl vertikal als auch horizontal
skalierbar. Er verwendet lose gekoppelte Processing Elements (PEs), die stateless arbeiten, sowie Arbiter/Speicherzuordnungspaare. Durch die granulare Kontrolle und
die Aufteilung aller Aspekte einer Lösung eignet sich der Ansatz fĂŒr die Implementierung anderer Software-Level-Lösungen in Hardware.
Viele Testszenarios werden durchgefĂŒhrt, um die in CR-IV erzielten Ergebnisse zu
verdeutlichen und zu ĂŒberprĂŒfen. Diese Szenarien reichen von direkten Leistungsmessungen bis hin zu verhaltensspezifischen Tests. ZusĂ€tzlich wird eine Labor-Demo
erstellt, die grundsÀtzlich auf ein Proof of Concept zielt. Die Demo stellt einen
praktischen Test anstelle szenariospezifischer Tests dar. Alle Testszenarien und die
Labor-Demo werden mit den Prototyp-Boards des Projekts durchgefšuhrt, d.h. es sind
keine Simulationstests. Die Ergebnisse stellen die realistischen Leistungen von CR-IV
mit bis zu 13,61 Gbit/s dar.Whether be it automotive, avionics or automation, advances in their respective real-time communication technology focus on further improving preexisting solutions. For
in-vehicle communication, the ever-increasing number of computer-based systems,
applications and connections as well as the use of multiple proprietary communication
standards results in an increasingly complex wiring harness. This is in-part due to
those standards being incompatible with one another. In addition to cost, this also
impacts weight, which in turn affects fuel consumption.
The work presented in this thesis is in-part theoretical and in-part applied. The
former is represented by a new protocol, while the latter corresponds to the protocolâs
hardware implementation. In the first part of the thesis, the real-time communication protocol of CarRing IV (CR-IV) is presented. It provides isochronous and hard
real-time guarantees without requiring network-wide clock synchronization. With up
to 16 nodes per ring, a CR-IV network can consist of as many as 256 rings interconnected by routers. CR-IV uses a reduced OSI model (layers 1-3, 7), which is both
typical of and preferable for its application areas. Moreover, it supports both event- and time-triggered communication paradigms. The transparent mode feature allows
CR-IV to act as a backbone for existing networks, thereby addressing incompatibility
concerns and easing the transition into a more unified network solution. Using this
feature, user devices can communicate with one another via a CR-IV network without
requiring user interference, or any user device or application changes. Combined with
the protocolâs reliable multicast, the feature extends CR-IVâs capabilities to include
field bus emulation. The second part of the thesis presents the other important aspect
of CR-IV. All of its OSI model layers are implemented in a FPGA using Hardware
Description Languages (HDLs) without relying-on or including any hard or soft processors. CR-IVâs Register-Transfer Level (RTL) hardware design is created using a new
approach that can best be described as token-based data-flow. The approach is both
vertically and horizontally scalable. It uses stateless and loosely coupled Processing
Elements (PEs) as well as arbiter/memory allocation pairs. By having granular control and compartmentalizing every aspect of a solution, the approach lends itself to
being used for implementing other software-level solutions in hardware.
Many test scenarios are conducted to both highlight and examine the results
achieved in CR-IV. Those scenarios range from direct performance measurements to
behavior-specific tests. Moreover, a lab-demo is created that essentially amounts to
a proof of concept. The demo represents a practical test as opposed to a scenariospecific one. Whether be it test scenarios or the lab-demo, all are carried-out using the
projectâs prototype boards, i.e. no simulation tests. The results obtained represent
CR-IVâs real-world realistic outcomes with up to 13.61 Gbps
- âŠ