1,109 research outputs found

    20th SC@RUG 2023 proceedings 2022-2023

    Get PDF

    Tools for efficient Deep Learning

    Get PDF
    In the era of Deep Learning (DL), there is a fast-growing demand for building and deploying Deep Neural Networks (DNNs) on various platforms. This thesis proposes five tools to address the challenges for designing DNNs that are efficient in time, in resources and in power consumption. We first present Aegis and SPGC to address the challenges in improving the memory efficiency of DL training and inference. Aegis makes mixed precision training (MPT) stabler by layer-wise gradient scaling. Empirical experiments show that Aegis can improve MPT accuracy by at most 4\%. SPGC focuses on structured pruning: replacing standard convolution with group convolution (GConv) to avoid irregular sparsity. SPGC formulates GConv pruning as a channel permutation problem and proposes a novel heuristic polynomial-time algorithm. Common DNNs pruned by SPGC have maximally 1\% higher accuracy than prior work. This thesis also addresses the challenges lying in the gap between DNN descriptions and executables by Polygeist for software and POLSCA for hardware. Many novel techniques, e.g. statement splitting and memory partitioning, are explored and used to expand polyhedral optimisation. Polygeist can speed up software execution in sequential and parallel by 2.53 and 9.47 times on Polybench/C. POLSCA achieves 1.5 times speedup over hardware designs directly generated from high-level synthesis on Polybench/C. Moreover, this thesis presents Deacon, a framework that generates FPGA-based DNN accelerators of streaming architectures with advanced pipelining techniques to address the challenges from heterogeneous convolution and residual connections. Deacon provides fine-grained pipelining, graph-level optimisation, and heuristic exploration by graph colouring. Compared with prior designs, Deacon shows resource/power consumption efficiency improvement of 1.2x/3.5x for MobileNets and 1.0x/2.8x for SqueezeNets. All these tools are open source, some of which have already gained public engagement. We believe they can make efficient deep learning applications easier to build and deploy.Open Acces

    Efficient resilience analysis and decision-making for complex engineering systems

    Get PDF
    Modern societies around the world are increasingly dependent on the smooth functionality of progressively more complex systems, such as infrastructure systems, digital systems like the internet, and sophisticated machinery. They form the cornerstones of our technologically advanced world and their efficiency is directly related to our well-being and the progress of society. However, these important systems are constantly exposed to a wide range of threats of natural, technological, and anthropogenic origin. The emergence of global crises such as the COVID-19 pandemic and the ongoing threat of climate change have starkly illustrated the vulnerability of these widely ramified and interdependent systems, as well as the impossibility of predicting threats entirely. The pandemic, with its widespread and unexpected impacts, demonstrated how an external shock can bring even the most advanced systems to a standstill, while the ongoing climate change continues to produce unprecedented risks to system stability and performance. These global crises underscore the need for systems that can not only withstand disruptions, but also, recover from them efficiently and rapidly. The concept of resilience and related developments encompass these requirements: analyzing, balancing, and optimizing the reliability, robustness, redundancy, adaptability, and recoverability of systems -- from both technical and economic perspectives. This cumulative dissertation, therefore, focuses on developing comprehensive and efficient tools for resilience-based analysis and decision-making of complex engineering systems. The newly developed resilience decision-making procedure is at the core of these developments. It is based on an adapted systemic risk measure, a time-dependent probabilistic resilience metric, as well as a grid search algorithm, and represents a significant innovation as it enables decision-makers to identify an optimal balance between different types of resilience-enhancing measures, taking into account monetary aspects. Increasingly, system components have significant inherent complexity, requiring them to be modeled as systems themselves. Thus, this leads to systems-of-systems with a high degree of complexity. To address this challenge, a novel methodology is derived by extending the previously introduced resilience framework to multidimensional use cases and synergistically merging it with an established concept from reliability theory, the survival signature. The new approach combines the advantages of both original components: a direct comparison of different resilience-enhancing measures from a multidimensional search space leading to an optimal trade-off in terms of system resilience, and a significant reduction in computational effort due to the separation property of the survival signature. It enables that once a subsystem structure has been computed -- a typically computational expensive process -- any characterization of the probabilistic failure behavior of components can be validated without having to recompute the structure. In reality, measurements, expert knowledge, and other sources of information are loaded with multiple uncertainties. For this purpose, an efficient method based on the combination of survival signature, fuzzy probability theory, and non-intrusive stochastic simulation (NISS) is proposed. This results in an efficient approach to quantify the reliability of complex systems, taking into account the entire uncertainty spectrum. The new approach, which synergizes the advantageous properties of its original components, achieves a significant decrease in computational effort due to the separation property of the survival signature. In addition, it attains a dramatic reduction in sample size due to the adapted NISS method: only a single stochastic simulation is required to account for uncertainties. The novel methodology not only represents an innovation in the field of reliability analysis, but can also be integrated into the resilience framework. For a resilience analysis of existing systems, the consideration of continuous component functionality is essential. This is addressed in a further novel development. By introducing the continuous survival function and the concept of the Diagonal Approximated Signature as a corresponding surrogate model, the existing resilience framework can be usefully extended without compromising its fundamental advantages. In the context of the regeneration of complex capital goods, a comprehensive analytical framework is presented to demonstrate the transferability and applicability of all developed methods to complex systems of any type. The framework integrates the previously developed resilience, reliability, and uncertainty analysis methods. It provides decision-makers with the basis for identifying resilient regeneration paths in two ways: first, in terms of regeneration paths with inherent resilience, and second, regeneration paths that lead to maximum system resilience, taking into account technical and monetary factors affecting the complex capital good under analysis. In summary, this dissertation offers innovative contributions to efficient resilience analysis and decision-making for complex engineering systems. It presents universally applicable methods and frameworks that are flexible enough to consider system types and performance measures of any kind. This is demonstrated in numerous case studies ranging from arbitrary flow networks, functional models of axial compressors to substructured infrastructure systems with several thousand individual components.Moderne Gesellschaften sind weltweit zunehmend von der reibungslosen FunktionalitĂ€t immer komplexer werdender Systeme, wie beispielsweise Infrastruktursysteme, digitale Systeme wie das Internet oder hochentwickelten Maschinen, abhĂ€ngig. Sie bilden die Eckpfeiler unserer technologisch fortgeschrittenen Welt, und ihre Effizienz steht in direktem Zusammenhang mit unserem Wohlbefinden sowie dem Fortschritt der Gesellschaft. Diese wichtigen Systeme sind jedoch einer stĂ€ndigen und breiten Palette von Bedrohungen natĂŒrlichen, technischen und anthropogenen Ursprungs ausgesetzt. Das Auftreten globaler Krisen wie die COVID-19-Pandemie und die anhaltende Bedrohung durch den Klimawandel haben die AnfĂ€lligkeit der weit verzweigten und voneinander abhĂ€ngigen Systeme sowie die Unmöglichkeit einer Gefahrenvorhersage in voller GĂ€nze eindrĂŒcklich verdeutlicht. Die Pandemie mit ihren weitreichenden und unerwarteten Auswirkungen hat gezeigt, wie ein externer Schock selbst die fortschrittlichsten Systeme zum Stillstand bringen kann, wĂ€hrend der anhaltende Klimawandel immer wieder beispiellose Risiken fĂŒr die SystemstabilitĂ€t und -leistung hervorbringt. Diese globalen Krisen unterstreichen den Bedarf an Systemen, die nicht nur Störungen standhalten, sondern sich auch schnell und effizient von ihnen erholen können. Das Konzept der Resilienz und die damit verbundenen Entwicklungen umfassen diese Anforderungen: Analyse, AbwĂ€gung und Optimierung der ZuverlĂ€ssigkeit, Robustheit, Redundanz, AnpassungsfĂ€higkeit und Wiederherstellbarkeit von Systemen -- sowohl aus technischer als auch aus wirtschaftlicher Sicht. In dieser kumulativen Dissertation steht daher die Entwicklung umfassender und effizienter Instrumente fĂŒr die Resilienz-basierte Analyse und Entscheidungsfindung von komplexen Systemen im Mittelpunkt. Das neu entwickelte Resilienz-Entscheidungsfindungsverfahren steht im Kern dieser Entwicklungen. Es basiert auf einem adaptierten systemischen Risikomaß, einer zeitabhĂ€ngigen, probabilistischen Resilienzmetrik sowie einem Gittersuchalgorithmus und stellt eine bedeutende Innovation dar, da es EntscheidungstrĂ€gern ermöglicht, ein optimales Gleichgewicht zwischen verschiedenen Arten von Resilienz-steigernden Maßnahmen unter BerĂŒcksichtigung monetĂ€rer Aspekte zu identifizieren. Zunehmend weisen Systemkomponenten eine erhebliche EigenkomplexitĂ€t auf, was dazu fĂŒhrt, dass sie selbst als Systeme modelliert werden mĂŒssen. Hieraus ergeben sich Systeme aus Systemen mit hoher KomplexitĂ€t. Um diese Herausforderung zu adressieren, wird eine neue Methodik abgeleitet, indem das zuvor eingefĂŒhrte Resilienzrahmenwerk auf multidimensionale AnwendungsfĂ€lle erweitert und synergetisch mit einem etablierten Konzept aus der ZuverlĂ€ssigkeitstheorie, der Überlebenssignatur, zusammengefĂŒhrt wird. Der neue Ansatz kombiniert die Vorteile beider ursprĂŒnglichen Komponenten: Einerseits ermöglicht er einen direkten Vergleich verschiedener Resilienz-steigernder Maßnahmen aus einem mehrdimensionalen Suchraum, der zu einem optimalen Kompromiss in Bezug auf die Systemresilienz fĂŒhrt. Andererseits ermöglicht er durch die Separationseigenschaft der Überlebenssignatur eine signifikante Reduktion des Rechenaufwands. Sobald eine Subsystemstruktur berechnet wurde -- ein typischerweise rechenintensiver Prozess -- kann jede Charakterisierung des probabilistischen Ausfallverhaltens von Komponenten validiert werden, ohne dass die Struktur erneut berechnet werden muss. In der RealitĂ€t sind Messungen, Expertenwissen sowie weitere Informationsquellen mit vielfĂ€ltigen Unsicherheiten belastet. HierfĂŒr wird eine effiziente Methode vorgeschlagen, die auf der Kombination von Überlebenssignatur, unscharfer Wahrscheinlichkeitstheorie und nicht-intrusiver stochastischer Simulation (NISS) basiert. Dadurch entsteht ein effizienter Ansatz zur Quantifizierung der ZuverlĂ€ssigkeit komplexer Systeme unter BerĂŒcksichtigung des gesamten Unsicherheitsspektrums. Der neue Ansatz, der die vorteilhaften Eigenschaften seiner ursprĂŒnglichen Komponenten synergetisch zusammenfĂŒhrt, erreicht eine bedeutende Verringerung des Rechenaufwands aufgrund der Separationseigenschaft der Überlebenssignatur. Er erzielt zudem eine drastische Reduzierung der StichprobengrĂ¶ĂŸe aufgrund der adaptierten NISS-Methode: Es wird nur eine einzige stochastische Simulation benötigt, um Unsicherheiten zu berĂŒcksichtigen. Die neue Methodik stellt nicht nur eine Neuerung auf dem Gebiet der ZuverlĂ€ssigkeitsanalyse dar, sondern kann auch in das Resilienzrahmenwerk integriert werden. FĂŒr eine Resilienzanalyse von real existierenden Systemen ist die BerĂŒcksichtigung kontinuierlicher KomponentenfunktionalitĂ€t unerlĂ€sslich. Diese wird in einer weiteren Neuentwicklung adressiert. Durch die EinfĂŒhrung der kontinuierlichen Überlebensfunktion und dem Konzept der Diagonal Approximated Signature als entsprechendes Ersatzmodell kann das bestehende Resilienzrahmenwerk sinnvoll erweitert werden, ohne seine grundlegenden Vorteile zu beeintrĂ€chtigen. Im Kontext der Regeneration komplexer InvestitionsgĂŒter wird ein umfassendes Analyserahmenwerk vorgestellt, um die Übertragbarkeit und Anwendbarkeit aller entwickelten Methoden auf komplexe Systeme jeglicher Art zu demonstrieren. Das Rahmenwerk integriert die zuvor entwickelten Methoden der Resilienz-, ZuverlĂ€ssigkeits- und Unsicherheitsanalyse. Es bietet EntscheidungstrĂ€gern die Basis fĂŒr die Identifikation resilienter Regenerationspfade in zweierlei Hinsicht: Zum einen im Sinne von Regenerationspfaden mit inhĂ€renter Resilienz und zum anderen Regenerationspfade, die zu einer maximalen Systemresilienz unter BerĂŒcksichtigung technischer und monetĂ€rer EinflussgrĂ¶ĂŸen des zu analysierenden komplexen Investitionsgutes fĂŒhren. Zusammenfassend bietet diese Dissertation innovative BeitrĂ€ge zur effizienten Resilienzanalyse und Entscheidungsfindung fĂŒr komplexe Ingenieursysteme. Sie prĂ€sentiert universell anwendbare Methoden und Rahmenwerke, die flexibel genug sind, um beliebige Systemtypen und Leistungsmaße zu berĂŒcksichtigen. Dies wird in zahlreichen Fallstudien von willkĂŒrlichen Flussnetzwerken, funktionalen Modellen von Axialkompressoren bis hin zu substrukturierten Infrastruktursystemen mit mehreren tausend Einzelkomponenten demonstriert

    20th SC@RUG 2023 proceedings 2022-2023

    Get PDF

    Investigation Of Metal Modulated Epitaxy Grown III-Nitride High-Power Electronic And Optoelectronic Devices

    Get PDF
    The wide-bandgap material GaN (Eg = 3.4 eV) continues to mature due to its achievements in high-power electronic and optoelectronic devices. The fully vertical GaN high-power devices show high performance but are very expensive. Quasi-vertical GaN devices are cost-effective but lack high performance due to low quality films. Improvement in the performance of quasi-vertical devices would make this technology suitable for high volume production. Although the market of GaN based devices is still growing, significantly higher performance parameters can potentially be achieved with the ultrawide-bandgap semiconductor material AlN with the bandgap as high as 6.1 eV. The only limitation to AlN-based devices so far was doping. An extensive study is performed to explore the 3D phase diagram of MME to find the optimized morphological, electrical, structural, and optical growth conditions to achieve high performance high-power quasi-vertical GaN pin diodes. Thick abrupt Beryllium step-doped GaN i-layers were used to demonstrate these high-power devices. Novel fabrication methods and current spreading layers are introduced in these devices. Furthermore, for the first time in more than 8 decades of AlN research, substantial bulk conduction, both p-type and n-type, was achieved in MME AlN and the first known p-n junction AlN diodes are demonstrated to extend the high-power performance of nitride technology. Also, MME mixed AlN/GaN Pin and junction barrier Schottky diodes are demonstrated to achieve low turn-on voltage and higher breakdown performance simultaneously. The conductive AlN films show great promise for AlN-based device applications that could potentially revolutionize deep ultraviolet light based viral and bacterial sterilization, polymer curing, high-temperature, high-voltage and high-power electronics among many societal impacts.Ph.D

    Harnessing Evolution in-Materio as an Unconventional Computing Resource

    Get PDF
    This thesis illustrates the use and development of physical conductive analogue systems for unconventional computing using the Evolution in-Materio (EiM) paradigm. EiM uses an Evolutionary Algorithm to configure and exploit a physical material (or medium) for computation. While EiM processors show promise, fundamental questions and scaling issues remain. Additionally, their development is hindered by slow manufacturing and physical experimentation. This work addressed these issues by implementing simulated models to speed up research efforts, followed by investigations of physically implemented novel in-materio devices. Initial work leveraged simulated conductive networks as single substrate ‘monolithic’ EiM processors, performing classification by formulating the system as an optimisation problem, solved using Differential Evolution. Different material properties and algorithm parameters were isolated and investigated; which explained the capabilities of configurable parameters and showed ideal nanomaterial choice depended upon problem complexity. Subsequently, drawing from concepts in the wider Machine Learning field, several enhancements to monolithic EiM processors were proposed and investigated. These ensured more efficient use of training data, better classification decision boundary placement, an independently optimised readout layer, and a smoother search space. Finally, scalability and performance issues were addressed by constructing in-Materio Neural Networks (iM-NNs), where several EiM processors were stacked in parallel and operated as physical realisations of Hidden Layer neurons. Greater flexibility in system implementation was achieved by re-using a single physical substrate recursively as several virtual neurons, but this sacrificed faster parallelised execution. These novel iM-NNs were first implemented using Simulated in-Materio neurons, and trained for classification as Extreme Learning Machines, which were found to outperform artificial networks of a similar size. Physical iM-NN were then implemented using a Raspberry Pi, custom Hardware Interface and Lambda Diode based Physical in-Materio neurons, which were trained successfully with neuroevolution. A more complex AutoEncoder structure was then proposed and implemented physically to perform dimensionality reduction on a handwritten digits dataset, outperforming both Principal Component Analysis and artificial AutoEncoders. This work presents an approach to exploit systems with interesting physical dynamics, and leverage them as a computational resource. Such systems could become low power, high speed, unconventional computing assets in the future

    The Protocol to the African Charter on Human and Peoples’ Rights on the Rights of Women in Africa : a commentary

    Get PDF
    Since its adoption on 11 July 2003, the Protocol to the African Charter on Human and Peoples’ Rights on the Rights of Women in Africa (the Maputo Protocol) has become a landmark on the African human rights landscape. It has steadily gained prominence as a trail-blazing instrument, responsive to the diverse realities of women on the African continent. This comprehensive Commentary on the Maputo Protocol, the first of its kind, provides systematic analysis of each article of the Protocol, delving into the drafting history, and elaborating on relevant key concepts and normative standards. This Commentary aims to be a ‘one-stop-shop’ for anyone interested in the Maputo Protocol, such as researchers, teachers, students, practitioners, policymakers and activists.https://www.pulp.up.ac.za/pulp-commentaries/the-protocol-to-the-african-charter-on-human-and-peoples-rights-on-the-rights-of-women-in-africa-a-commentaryhj2023Centre for Human Right

    Parallel and Flow-Based High Quality Hypergraph Partitioning

    Get PDF
    Balanced hypergraph partitioning is a classic NP-hard optimization problem that is a fundamental tool in such diverse disciplines as VLSI circuit design, route planning, sharding distributed databases, optimizing communication volume in parallel computing, and accelerating the simulation of quantum circuits. Given a hypergraph and an integer kk, the task is to divide the vertices into kk disjoint blocks with bounded size, while minimizing an objective function on the hyperedges that span multiple blocks. In this dissertation we consider the most commonly used objective, the connectivity metric, where we aim to minimize the number of different blocks connected by each hyperedge. The most successful heuristic for balanced partitioning is the multilevel approach, which consists of three phases. In the coarsening phase, vertex clusters are contracted to obtain a sequence of structurally similar but successively smaller hypergraphs. Once sufficiently small, an initial partition is computed. Lastly, the contractions are successively undone in reverse order, and an iterative improvement algorithm is employed to refine the projected partition on each level. An important aspect in designing practical heuristics for optimization problems is the trade-off between solution quality and running time. The appropriate trade-off depends on the specific application, the size of the data sets, and the computational resources available to solve the problem. Existing algorithms are either slow, sequential and offer high solution quality, or are simple, fast, easy to parallelize, and offer low quality. While this trade-off cannot be avoided entirely, our goal is to close the gaps as much as possible. We achieve this by improving the state of the art in all non-trivial areas of the trade-off landscape with only a few techniques, but employed in two different ways. Furthermore, most research on parallelization has focused on distributed memory, which neglects the greater flexibility of shared-memory algorithms and the wide availability of commodity multi-core machines. In this thesis, we therefore design and revisit fundamental techniques for each phase of the multilevel approach, and develop highly efficient shared-memory parallel implementations thereof. We consider two iterative improvement algorithms, one based on the Fiduccia-Mattheyses (FM) heuristic, and one based on label propagation. For these, we propose a variety of techniques to improve the accuracy of gains when moving vertices in parallel, as well as low-level algorithmic improvements. For coarsening, we present a parallel variant of greedy agglomerative clustering with a novel method to resolve cluster join conflicts on-the-fly. Combined with a preprocessing phase for coarsening based on community detection, a portfolio of from-scratch partitioning algorithms, as well as recursive partitioning with work-stealing, we obtain our first parallel multilevel framework. It is the fastest partitioner known, and achieves medium-high quality, beating all parallel partitioners, and is close to the highest quality sequential partitioner. Our second contribution is a parallelization of an n-level approach, where only one vertex is contracted and uncontracted on each level. This extreme approach aims at high solution quality via very fine-grained, localized refinement, but seems inherently sequential. We devise an asynchronous n-level coarsening scheme based on a hierarchical decomposition of the contractions, as well as a batch-synchronous uncoarsening, and later fully asynchronous uncoarsening. In addition, we adapt our refinement algorithms, and also use the preprocessing and portfolio. This scheme is highly scalable, and achieves the same quality as the highest quality sequential partitioner (which is based on the same components), but is of course slower than our first framework due to fine-grained uncoarsening. The last ingredient for high quality is an iterative improvement algorithm based on maximum flows. In the sequential setting, we first improve an existing idea by solving incremental maximum flow problems, which leads to smaller cuts and is faster due to engineering efforts. Subsequently, we parallelize the maximum flow algorithm and schedule refinements in parallel. Beyond the strive for highest quality, we present a deterministically parallel partitioning framework. We develop deterministic versions of the preprocessing, coarsening, and label propagation refinement. Experimentally, we demonstrate that the penalties for determinism in terms of partition quality and running time are very small. All of our claims are validated through extensive experiments, comparing our algorithms with state-of-the-art solvers on large and diverse benchmark sets. To foster further research, we make our contributions available in our open-source framework Mt-KaHyPar. While it seems inevitable, that with ever increasing problem sizes, we must transition to distributed memory algorithms, the study of shared-memory techniques is not in vain. With the multilevel approach, even the inherently slow techniques have a role to play in fast systems, as they can be employed to boost quality on coarse levels at little expense. Similarly, techniques for shared-memory parallelism are important, both as soon as a coarse graph fits into memory, and as local building blocks in the distributed algorithm
    • 

    corecore