998 research outputs found

    Language Design for Reactive Systems: On Modal Models, Time, and Object Orientation in Lingua Franca and SCCharts

    Get PDF
    Reactive systems play a crucial role in the embedded domain. They continuously interact with their environment, handle concurrent operations, and are commonly expected to provide deterministic behavior to enable application in safety-critical systems. In this context, language design is a key aspect, since carefully tailored language constructs can aid in addressing the challenges faced in this domain, as illustrated by the various concurrency models that prevent the known pitfalls of regular threads. Today, many languages exist in this domain and often provide unique characteristics that make them specifically fit for certain use cases. This thesis evolves around two distinctive languages: the actor-oriented polyglot coordination language Lingua Franca and the synchronous statecharts dialect SCCharts. While they take different approaches in providing reactive modeling capabilities, they share clear similarities in their semantics and complement each other in design principles. This thesis analyzes and compares key design aspects in the context of these two languages. For three particularly relevant concepts, it provides and evaluates lean and seamless language extensions that are carefully aligned with the fundamental principles of the underlying language. Specifically, Lingua Franca is extended toward coordinating modal behavior, while SCCharts receives a timed automaton notation with an efficient execution model using dynamic ticks and an extension toward the object-oriented modeling paradigm

    Backpropagation Beyond the Gradient

    Get PDF
    Automatic differentiation is a key enabler of deep learning: previously, practitioners were limited to models for which they could manually compute derivatives. Now, they can create sophisticated models with almost no restrictions and train them using first-order, i. e. gradient, information. Popular libraries like PyTorch and TensorFlow compute this gradient efficiently, automatically, and conveniently with a single line of code. Under the hood, reverse-mode automatic differentiation, or gradient backpropagation, powers the gradient computation in these libraries. Their entire design centers around gradient backpropagation. These frameworks are specialized around one specific task—computing the average gradient in a mini-batch. This specialization often complicates the extraction of other information like higher-order statistical moments of the gradient, or higher-order derivatives like the Hessian. It limits practitioners and researchers to methods that rely on the gradient. Arguably, this hampers the field from exploring the potential of higher-order information and there is evidence that focusing solely on the gradient has not lead to significant recent advances in deep learning optimization. To advance algorithmic research and inspire novel ideas, information beyond the batch-averaged gradient must be made available at the same level of computational efficiency, automation, and convenience. This thesis presents approaches to simplify experimentation with rich information beyond the gradient by making it more readily accessible. We present an implementation of these ideas as an extension to the backpropagation procedure in PyTorch. Using this newly accessible information, we demonstrate possible use cases by (i) showing how it can inform our understanding of neural network training by building a diagnostic tool, and (ii) enabling novel methods to efficiently compute and approximate curvature information. First, we extend gradient backpropagation for sequential feedforward models to Hessian backpropagation which enables computing approximate per-layer curvature. This perspective unifies recently proposed block- diagonal curvature approximations. Like gradient backpropagation, the computation of these second-order derivatives is modular, and therefore simple to automate and extend to new operations. Based on the insight that rich information beyond the gradient can be computed efficiently and at the same time, we extend the backpropagation in PyTorch with the BackPACK library. It provides efficient and convenient access to statistical moments of the gradient and approximate curvature information, often at a small overhead compared to computing just the gradient. Next, we showcase the utility of such information to better understand neural network training. We build the Cockpit library that visualizes what is happening inside the model during training through various instruments that rely on BackPACK’s statistics. We show how Cockpit provides a meaningful statistical summary report to the deep learning engineer to identify bugs in their machine learning pipeline, guide hyperparameter tuning, and study deep learning phenomena. Finally, we use BackPACK’s extended automatic differentiation functionality to develop ViViT, an approach to efficiently compute curvature information, in particular curvature noise. It uses the low-rank structure of the generalized Gauss-Newton approximation to the Hessian and addresses shortcomings in existing curvature approximations. Through monitoring curvature noise, we demonstrate how ViViT’s information helps in understanding challenges to make second-order optimization methods work in practice. This work develops new tools to experiment more easily with higher-order information in complex deep learning models. These tools have impacted works on Bayesian applications with Laplace approximations, out-of-distribution generalization, differential privacy, and the design of automatic differentia- tion systems. They constitute one important step towards developing and establishing more efficient deep learning algorithms

    Guided rewriting and constraint satisfaction for parallel GPU code generation

    Get PDF
    Graphics Processing Units (GPUs) are notoriously hard to optimise for manually due to their scheduling and memory hierarchies. What is needed are good automatic code generators and optimisers for such parallel hardware. Functional approaches such as Accelerate, Futhark and LIFT leverage a high-level algorithmic Intermediate Representation (IR) to expose parallelism and abstract the implementation details away from the user. However, producing efficient code for a given accelerator remains challenging. Existing code generators depend on the user input to choose a subset of hard-coded optimizations or automated exploration of implementation search space. The former suffers from the lack of extensibility, while the latter is too costly due to the size of the search space. A hybrid approach is needed, where a space of valid implementations is built automatically and explored with the aid of human expertise. This thesis presents a solution combining user-guided rewriting and automatically generated constraints to produce high-performance code. The first contribution is an automatic tuning technique to find a balance between performance and memory consumption. Leveraging its functional patterns, the LIFT compiler is empowered to infer tuning constraints and limit the search to valid tuning combinations only. Next, the thesis reframes parallelisation as a constraint satisfaction problem. Parallelisation constraints are extracted automatically from the input expression, and a solver is used to identify valid rewriting. The constraints truncate the search space to valid parallel mappings only by capturing the scheduling restrictions of the GPU in the context of a given program. A synchronisation barrier insertion technique is proposed to prevent data races and improve the efficiency of the generated parallel mappings. The final contribution of this thesis is the guided rewriting method, where the user encodes a design space of structural transformations using high-level IR nodes called rewrite points. These strongly typed pragmas express macro rewrites and expose design choices as explorable parameters. The thesis proposes a small set of reusable rewrite points to achieve tiling, cache locality, data reuse and memory optimisation. A comparison with the vendor-provided handwritten kernel ARM Compute Library and the TVM code generator demonstrates the effectiveness of this thesis' contributions. With convolution as a use case, LIFT-generated direct and GEMM-based convolution implementations are shown to perform on par with the state-of-the-art solutions on a mobile GPU. Overall, this thesis demonstrates that a functional IR yields well to user-guided and automatic rewriting for high-performance code generation

    Complex systems methods characterizing nonlinear processes in the near-Earth electromagnetic environment: recent advances and open challenges

    Get PDF
    Learning from successful applications of methods originating in statistical mechanics, complex systems science, or information theory in one scientific field (e.g., atmospheric physics or climatology) can provide important insights or conceptual ideas for other areas (e.g., space sciences) or even stimulate new research questions and approaches. For instance, quantification and attribution of dynamical complexity in output time series of nonlinear dynamical systems is a key challenge across scientific disciplines. Especially in the field of space physics, an early and accurate detection of characteristic dissimilarity between normal and abnormal states (e.g., pre-storm activity vs. magnetic storms) has the potential to vastly improve space weather diagnosis and, consequently, the mitigation of space weather hazards. This review provides a systematic overview on existing nonlinear dynamical systems-based methodologies along with key results of their previous applications in a space physics context, which particularly illustrates how complementary modern complex systems approaches have recently shaped our understanding of nonlinear magnetospheric variability. The rising number of corresponding studies demonstrates that the multiplicity of nonlinear time series analysis methods developed during the last decades offers great potentials for uncovering relevant yet complex processes interlinking different geospace subsystems, variables and spatiotemporal scales

    Tools for efficient Deep Learning

    Get PDF
    In the era of Deep Learning (DL), there is a fast-growing demand for building and deploying Deep Neural Networks (DNNs) on various platforms. This thesis proposes five tools to address the challenges for designing DNNs that are efficient in time, in resources and in power consumption. We first present Aegis and SPGC to address the challenges in improving the memory efficiency of DL training and inference. Aegis makes mixed precision training (MPT) stabler by layer-wise gradient scaling. Empirical experiments show that Aegis can improve MPT accuracy by at most 4\%. SPGC focuses on structured pruning: replacing standard convolution with group convolution (GConv) to avoid irregular sparsity. SPGC formulates GConv pruning as a channel permutation problem and proposes a novel heuristic polynomial-time algorithm. Common DNNs pruned by SPGC have maximally 1\% higher accuracy than prior work. This thesis also addresses the challenges lying in the gap between DNN descriptions and executables by Polygeist for software and POLSCA for hardware. Many novel techniques, e.g. statement splitting and memory partitioning, are explored and used to expand polyhedral optimisation. Polygeist can speed up software execution in sequential and parallel by 2.53 and 9.47 times on Polybench/C. POLSCA achieves 1.5 times speedup over hardware designs directly generated from high-level synthesis on Polybench/C. Moreover, this thesis presents Deacon, a framework that generates FPGA-based DNN accelerators of streaming architectures with advanced pipelining techniques to address the challenges from heterogeneous convolution and residual connections. Deacon provides fine-grained pipelining, graph-level optimisation, and heuristic exploration by graph colouring. Compared with prior designs, Deacon shows resource/power consumption efficiency improvement of 1.2x/3.5x for MobileNets and 1.0x/2.8x for SqueezeNets. All these tools are open source, some of which have already gained public engagement. We believe they can make efficient deep learning applications easier to build and deploy.Open Acces

    Deep Multimodality Image-Guided System for Assisting Neurosurgery

    Get PDF
    Intrakranielle Hirntumoren gehören zu den zehn hĂ€ufigsten bösartigen Krebsarten und sind fĂŒr eine erhebliche MorbiditĂ€t und MortalitĂ€t verantwortlich. Die grĂ¶ĂŸte histologische Kategorie der primĂ€ren Hirntumoren sind die Gliome, die ein Ă€ußerst heterogenes Erschei-nungsbild aufweisen und radiologisch schwer von anderen HirnlĂ€sionen zu unterscheiden sind. Die Neurochirurgie ist meist die Standardbehandlung fĂŒr neu diagnostizierte Gliom-Patienten und kann von einer Strahlentherapie und einer adjuvanten Temozolomid-Chemotherapie gefolgt werden. Die Hirntumorchirurgie steht jedoch vor großen Herausforderungen, wenn es darum geht, eine maximale Tumorentfernung zu erreichen und gleichzeitig postoperative neurologische Defizite zu vermeiden. Zwei dieser neurochirurgischen Herausforderungen werden im Folgenden vorgestellt. Erstens ist die manuelle Abgrenzung des Glioms einschließlich seiner Unterregionen aufgrund seines infiltrativen Charakters und des Vorhandenseins einer heterogenen KontrastverstĂ€rkung schwierig. Zweitens verformt das Gehirn seine Form ̶ die so genannte "Hirnverschiebung" ̶ als Reaktion auf chirurgische Manipulationen, Schwellungen durch osmotische Medikamente und AnĂ€sthesie, was den Nutzen prĂ€opera-tiver Bilddaten fĂŒr die Steuerung des Eingriffs einschrĂ€nkt. Bildgesteuerte Systeme bieten Ärzten einen unschĂ€tzbaren Einblick in anatomische oder pathologische Ziele auf der Grundlage moderner BildgebungsmodalitĂ€ten wie Magnetreso-nanztomographie (MRT) und Ultraschall (US). Bei den bildgesteuerten Instrumenten handelt es sich hauptsĂ€chlich um computergestĂŒtzte Systeme, die mit Hilfe von Computer-Vision-Methoden die DurchfĂŒhrung perioperativer chirurgischer Eingriffe erleichtern. Die Chirurgen mĂŒssen jedoch immer noch den Operationsplan aus prĂ€operativen Bildern gedanklich mit Echtzeitinformationen zusammenfĂŒhren, wĂ€hrend sie die chirurgischen Instrumente im Körper manipulieren und die Zielerreichung ĂŒberwachen. Daher war die Notwendigkeit einer BildfĂŒhrung wĂ€hrend neurochirurgischer Eingriffe schon immer ein wichtiges Anliegen der Ärzte. Ziel dieser Forschungsarbeit ist die Entwicklung eines neuartigen Systems fĂŒr die peri-operative bildgefĂŒhrte Neurochirurgie (IGN), nĂ€mlich DeepIGN, mit dem die erwarteten Ergebnisse der Hirntumorchirurgie erzielt werden können, wodurch die GesamtĂŒberle-bensrate maximiert und die postoperative neurologische MorbiditĂ€t minimiert wird. Im Rahmen dieser Arbeit werden zunĂ€chst neuartige Methoden fĂŒr die Kernbestandteile des DeepIGN-Systems der Hirntumor-Segmentierung im MRT und der multimodalen prĂ€ope-rativen MRT zur intraoperativen US-Bildregistrierung (iUS) unter Verwendung der jĂŒngs-ten Entwicklungen im Deep Learning vorgeschlagen. Anschließend wird die Ergebnisvor-hersage der verwendeten Deep-Learning-Netze weiter interpretiert und untersucht, indem fĂŒr den Menschen verstĂ€ndliche, erklĂ€rbare Karten erstellt werden. Schließlich wurden Open-Source-Pakete entwickelt und in weithin anerkannte Software integriert, die fĂŒr die Integration von Informationen aus Tracking-Systemen, die Bildvisualisierung und -fusion sowie die Anzeige von Echtzeit-Updates der Instrumente in Bezug auf den Patientenbe-reich zustĂ€ndig ist. Die Komponenten von DeepIGN wurden im Labor validiert und in einem simulierten Operationssaal evaluiert. FĂŒr das Segmentierungsmodul erreichte DeepSeg, ein generisches entkoppeltes Deep-Learning-Framework fĂŒr die automatische Abgrenzung von Gliomen in der MRT des Gehirns, eine Genauigkeit von 0,84 in Bezug auf den WĂŒrfelkoeffizienten fĂŒr das Bruttotumorvolumen. Leistungsverbesserungen wurden bei der Anwendung fort-schrittlicher Deep-Learning-AnsĂ€tze wie 3D-Faltungen ĂŒber alle Schichten, regionenbasier-tes Training, fliegende Datenerweiterungstechniken und Ensemble-Methoden beobachtet. Um Hirnverschiebungen zu kompensieren, wird ein automatisierter, schneller und genauer deformierbarer Ansatz, iRegNet, fĂŒr die Registrierung prĂ€operativer MRT zu iUS-Volumen als Teil des multimodalen Registrierungsmoduls vorgeschlagen. Es wurden umfangreiche Experimente mit zwei Multi-Location-Datenbanken durchgefĂŒhrt: BITE und RESECT. Zwei erfahrene Neurochirurgen fĂŒhrten eine zusĂ€tzliche qualitative Validierung dieser Studie durch, indem sie MRT-iUS-Paare vor und nach der deformierbaren Registrierung ĂŒberlagerten. Die experimentellen Ergebnisse zeigen, dass das vorgeschlagene iRegNet schnell ist und die besten Genauigkeiten erreicht. DarĂŒber hinaus kann das vorgeschlagene iRegNet selbst bei nicht trainierten Bildern konkurrenzfĂ€hige Ergebnisse liefern, was seine AllgemeingĂŒltigkeit unter Beweis stellt und daher fĂŒr die intraoperative neurochirurgische FĂŒhrung von Nutzen sein kann. FĂŒr das Modul "ErklĂ€rbarkeit" wird das NeuroXAI-Framework vorgeschlagen, um das Vertrauen medizinischer Experten in die Anwendung von KI-Techniken und tiefen neuro-nalen Netzen zu erhöhen. Die NeuroXAI umfasst sieben ErklĂ€rungsmethoden, die Visuali-sierungskarten bereitstellen, um tiefe Lernmodelle transparent zu machen. Die experimen-tellen Ergebnisse zeigen, dass der vorgeschlagene XAI-Rahmen eine gute Leistung bei der Extraktion lokaler und globaler Kontexte sowie bei der Erstellung erklĂ€rbarer Salienzkar-ten erzielt, um die Vorhersage des tiefen Netzwerks zu verstehen. DarĂŒber hinaus werden Visualisierungskarten erstellt, um den Informationsfluss in den internen Schichten des Encoder-Decoder-Netzwerks zu erkennen und den Beitrag der MRI-ModalitĂ€ten zur end-gĂŒltigen Vorhersage zu verstehen. Der ErklĂ€rungsprozess könnte medizinischen Fachleu-ten zusĂ€tzliche Informationen ĂŒber die Ergebnisse der Tumorsegmentierung liefern und somit helfen zu verstehen, wie das Deep-Learning-Modell MRT-Daten erfolgreich verar-beiten kann. Außerdem wurde ein interaktives neurochirurgisches Display fĂŒr die EingriffsfĂŒhrung entwickelt, das die verfĂŒgbare kommerzielle Hardware wie iUS-NavigationsgerĂ€te und Instrumentenverfolgungssysteme unterstĂŒtzt. Das klinische Umfeld und die technischen Anforderungen des integrierten multimodalen DeepIGN-Systems wurden mit der FĂ€higkeit zur Integration von (1) prĂ€operativen MRT-Daten und zugehörigen 3D-Volumenrekonstruktionen, (2) Echtzeit-iUS-Daten und (3) positioneller Instrumentenver-folgung geschaffen. Die Genauigkeit dieses Systems wurde anhand eines benutzerdefi-nierten Agar-Phantom-Modells getestet, und sein Einsatz in einem vorklinischen Operati-onssaal wurde simuliert. Die Ergebnisse der klinischen Simulation bestĂ€tigten, dass die Montage des Systems einfach ist, in einer klinisch akzeptablen Zeit von 15 Minuten durchgefĂŒhrt werden kann und mit einer klinisch akzeptablen Genauigkeit erfolgt. In dieser Arbeit wurde ein multimodales IGN-System entwickelt, das die jĂŒngsten Fort-schritte im Bereich des Deep Learning nutzt, um Neurochirurgen prĂ€zise zu fĂŒhren und prĂ€- und intraoperative Patientenbilddaten sowie interventionelle GerĂ€te in das chirurgi-sche Verfahren einzubeziehen. DeepIGN wurde als Open-Source-Forschungssoftware entwickelt, um die Forschung auf diesem Gebiet zu beschleunigen, die gemeinsame Nut-zung durch mehrere Forschungsgruppen zu erleichtern und eine kontinuierliche Weiter-entwicklung durch die Gemeinschaft zu ermöglichen. Die experimentellen Ergebnisse sind sehr vielversprechend fĂŒr die Anwendung von Deep-Learning-Modellen zur UnterstĂŒtzung interventioneller Verfahren - ein entscheidender Schritt zur Verbesserung der chirurgi-schen Behandlung von Hirntumoren und der entsprechenden langfristigen postoperativen Ergebnisse

    Efficient algorithms for simulation and analysis of many-body systems

    Get PDF
    This thesis introduces methods to efficiently generate and analyze time series data of many-body systems. While we have a strong focus on biomolecular processes, the presented methods can also be applied more generally. Due to limitations of microscope resolution in both space and time, biomolecular processes are especially hard to observe experimentally. Computer models offer an opportunity to work around these limitations. However, as these models are bound by computational effort, careful selection of the model as well as its efficient implementation play a fundamental role in their successful sampling and/or estimation. Especially for high levels of resolution, computer simulations can produce vast amounts of high-dimensional data and in general it is not straightforward to visualize, let alone to identify the relevant features and processes. To this end, we cover tools for projecting time series data onto important processes, finding over time geometrically stable features in observable space, and identifying governing dynamics. We introduce the novel software library deeptime with two main goals: (1) making methods which were developed in different communities (such as molecular dynamics and fluid dynamics) accessible to a broad user base by implementing them in a general-purpose way, and (2) providing an easy to install, extend, and maintain library by employing a high degree of modularity and introducing as few hard dependencies as possible. We demonstrate and compare the capabilities of the provided methods based on numerical examples. Subsequently, the particle-based reaction-diffusion simulation software package ReaDDy2 is introduced. It can simulate dynamics which are more complicated than what is usually analyzed with the methods available in deeptime. It is a significantly more efficient, feature-rich, flexible, and user-friendly version of its predecessor ReaDDy. As such, it enables---at the simulation model's resolution---the possibility to study larger systems and to cover longer timescales. In particular, ReaDDy2 is capable of modeling complex processes featuring particle crowding, space exclusion, association and dissociation events, dynamic formation and dissolution of particle geometries on a mesoscopic scale. The validity of the ReaDDy2 model is asserted by several numerical studies which are compared to analytically obtained results, simulations from other packages, or literature data. Finally, we present reactive SINDy, a method that can detect reaction networks from concentration curves of chemical species. It extends the SINDy method---contained in deeptime---by introducing coupling terms over a system of ordinary differential equations in an ansatz reaction space. As such, it transforms an ordinary linear regression problem to a linear tensor regression. The method employs a sparsity-promoting regularization which leads to especially simple and interpretable models. We show in biologically motivated example systems that the method is indeed capable of detecting the correct underlying reaction dynamics and that the sparsity regularization plays a key role in pruning otherwise spuriously detected reactions

    Effects of Fibril Morphology and Interfacial Interactions on the Behavior of Polymer-Grafted Cellulose Nanofibril Reinforced Thermoplastic Composites

    Get PDF
    Mechanically refined cellulose nanofibrils (CNFs) promise to be a high-volume, sustainable, nanoscale reinforcement for thermoplastic composites. They are currently held back by poor interfacial interactions with composite matrices, energy intensive drying, and drying induced fibril aggregation. In this dissertation, we explored how a grafting-through polymerization scheme modified the surface of CNFs with a wide variety of commodity polymers and overcame many of these technical challenges. The first phase of the research was concerned with characterizing the unique morphology of these CNFs as a function of refinement energy. This characterization was employed to understand how the materials’ morphologies affected their interfacial interactions with porous substrates. In this work, optical, scanning electron, and atomic force microscopy were used to characterize the materials and mechanical testing was used to assess their interfacial interactions with porous model substrates. The second phase of the research explored how the grafting-through polymerization of commodity monomers occurred in the presence of methacrylated CNFs. Infrared spectroscopy measurements were used to explore the degree of grafting and microscopic analyses were employed to understand how these modifications affected the materials’ suspension morphology. The final phase of the research looked at the modifications’ effects on drying behavior, surface energetics, and reinforcement ability in poly(lactic acid) (PLA). Scanning electron microscopy and inverse gas chromatography provided insights into how the grafted-polymer modifications improved the fibrillar morphology of spray-dried CNFs and increased their interfacial adhesion to PLA. Tensile testing and rheological characterization of composites made from these spray dried materials revealed their improved dispersion and network formation in the PLA matrix. Scale up of bench scale reactions to the pilot scale are demonstrated and 3D printing trials were conducted. Dramatic improvements in mechanical properties were seen for 3D printed samples modified with poly(N-isopropyl acrylamide). These improvements in mechanical properties were explored by dynamic mechanical analysis and tensile testing, revealing the effects of fibril alignment during printing

    Supporting Custom Instructions with the LLVM Compiler for RISC-V Processor

    Full text link
    The rise of hardware accelerators with custom instructions necessitates custom compiler backends supporting these accelerators. This study provides detailed analyses of LLVM and its RISC-V backend, supplemented with case studies providing end-to-end overview of the mentioned transformations. We discuss that instruction design should consider both hardware and software design space. The necessary compiler modifications may mean that the instruction is not well designed and need to be reconsidered. We discuss that RISC-V standard extensions provide exemplary instructions that can guide instruction designers. In this study, the process of adding a custom instruction to compiler is split into two parts as Assembler support and pattern matching support. Without pattern matching support, conventional software requires manual entries of inline Assembly for the accelerator which is not scalable. While it is trivial to add Assembler support regardless of the instruction semantics, pattern matching support is on the contrary. Pattern matching support and choosing the right stage for the modification, requires the knowledge of the internal transformations in the compiler. This study delves deep into pattern matching and presents multiple ways to approach the problem of pattern matching support. It is discussed that depending on the pattern's complexity, higher level transformations, e.g. IR level, can be more maintainable compared to Instruction Selection phase.Comment: Electronics and Communication Engineering B.Sc. Graduation Project. Source can be found in https://github.com/eymay/Senior-Design-Projec
    • 

    corecore