258 research outputs found

    Tools for efficient Deep Learning

    Get PDF
    In the era of Deep Learning (DL), there is a fast-growing demand for building and deploying Deep Neural Networks (DNNs) on various platforms. This thesis proposes five tools to address the challenges for designing DNNs that are efficient in time, in resources and in power consumption. We first present Aegis and SPGC to address the challenges in improving the memory efficiency of DL training and inference. Aegis makes mixed precision training (MPT) stabler by layer-wise gradient scaling. Empirical experiments show that Aegis can improve MPT accuracy by at most 4\%. SPGC focuses on structured pruning: replacing standard convolution with group convolution (GConv) to avoid irregular sparsity. SPGC formulates GConv pruning as a channel permutation problem and proposes a novel heuristic polynomial-time algorithm. Common DNNs pruned by SPGC have maximally 1\% higher accuracy than prior work. This thesis also addresses the challenges lying in the gap between DNN descriptions and executables by Polygeist for software and POLSCA for hardware. Many novel techniques, e.g. statement splitting and memory partitioning, are explored and used to expand polyhedral optimisation. Polygeist can speed up software execution in sequential and parallel by 2.53 and 9.47 times on Polybench/C. POLSCA achieves 1.5 times speedup over hardware designs directly generated from high-level synthesis on Polybench/C. Moreover, this thesis presents Deacon, a framework that generates FPGA-based DNN accelerators of streaming architectures with advanced pipelining techniques to address the challenges from heterogeneous convolution and residual connections. Deacon provides fine-grained pipelining, graph-level optimisation, and heuristic exploration by graph colouring. Compared with prior designs, Deacon shows resource/power consumption efficiency improvement of 1.2x/3.5x for MobileNets and 1.0x/2.8x for SqueezeNets. All these tools are open source, some of which have already gained public engagement. We believe they can make efficient deep learning applications easier to build and deploy.Open Acces

    Anpassen verteilter eingebetteter Anwendungen im laufenden Betrieb

    Get PDF
    The availability of third-party apps is among the key success factors for software ecosystems: The users benefit from more features and innovation speed, while third-party solution vendors can leverage the platform to create successful offerings. However, this requires a certain decoupling of engineering activities of the different parties not achieved for distributed control systems, yet. While late and dynamic integration of third-party components would be required, resulting control systems must provide high reliability regarding real-time requirements, which leads to integration complexity. Closing this gap would particularly contribute to the vision of software-defined manufacturing, where an ecosystem of modern IT-based control system components could lead to faster innovations due to their higher abstraction and availability of various frameworks. Therefore, this thesis addresses the research question: How we can use modern IT technologies and enable independent evolution and easy third-party integration of software components in distributed control systems, where deterministic end-to-end reactivity is required, and especially, how can we apply distributed changes to such systems consistently and reactively during operation? This thesis describes the challenges and related approaches in detail and points out that existing approaches do not fully address our research question. To tackle this gap, a formal specification of a runtime platform concept is presented in conjunction with a model-based engineering approach. The engineering approach decouples the engineering steps of component definition, integration, and deployment. The runtime platform supports this approach by isolating the components, while still offering predictable end-to-end real-time behavior. Independent evolution of software components is supported through a concept for synchronous reconfiguration during full operation, i.e., dynamic orchestration of components. Time-critical state transfer is supported, too, and can lead to bounded quality degradation, at most. The reconfiguration planning is supported by analysis concepts, including simulation of a formally specified system and reconfiguration, and analyzing potential quality degradation with the evolving dataflow graph (EDFG) method. A platform-specific realization of the concepts, the real-time container architecture, is described as a reference implementation. The model and the prototype are evaluated regarding their feasibility and applicability of the concepts by two case studies. The first case study is a minimalistic distributed control system used in different setups with different component variants and reconfiguration plans to compare the model and the prototype and to gather runtime statistics. The second case study is a smart factory showcase system with more challenging application components and interface technologies. The conclusion is that the concepts are feasible and applicable, even though the concepts and the prototype still need to be worked on in future -- for example, to reach shorter cycle times.Eine große Auswahl von Drittanbieter-Lösungen ist einer der Schlüsselfaktoren für Software Ecosystems: Nutzer profitieren vom breiten Angebot und schnellen Innovationen, während Drittanbieter über die Plattform erfolgreiche Lösungen anbieten können. Das jedoch setzt eine gewisse Entkopplung von Entwicklungsschritten der Beteiligten voraus, welche für verteilte Steuerungssysteme noch nicht erreicht wurde. Während Drittanbieter-Komponenten möglichst spät -- sogar Laufzeit -- integriert werden müssten, müssen Steuerungssysteme jedoch eine hohe Zuverlässigkeit gegenüber Echtzeitanforderungen aufweisen, was zu Integrationskomplexität führt. Dies zu lösen würde insbesondere zur Vision von Software-definierter Produktion beitragen, da ein Ecosystem für moderne IT-basierte Steuerungskomponenten wegen deren höherem Abstraktionsgrad und der Vielzahl verfügbarer Frameworks zu schnellerer Innovation führen würde. Daher behandelt diese Dissertation folgende Forschungsfrage: Wie können wir moderne IT-Technologien verwenden und unabhängige Entwicklung und einfache Integration von Software-Komponenten in verteilten Steuerungssystemen ermöglichen, wo Ende-zu-Ende-Echtzeitverhalten gefordert ist, und wie können wir insbesondere verteilte Änderungen an solchen Systemen konsistent und im Vollbetrieb vornehmen? Diese Dissertation beschreibt Herausforderungen und verwandte Ansätze im Detail und zeigt auf, dass existierende Ansätze diese Frage nicht vollständig behandeln. Um diese Lücke zu schließen, beschreiben wir eine formale Spezifikation einer Laufzeit-Plattform und einen zugehörigen Modell-basierten Engineering-Ansatz. Dieser Ansatz entkoppelt die Design-Schritte der Entwicklung, Integration und des Deployments von Komponenten. Die Laufzeit-Plattform unterstützt den Ansatz durch Isolation von Komponenten und zugleich Zeit-deterministischem Ende-zu-Ende-Verhalten. Unabhängige Entwicklung und Integration werden durch Konzepte für synchrone Rekonfiguration im Vollbetrieb unterstützt, also durch dynamische Orchestrierung. Dies beinhaltet auch Zeit-kritische Zustands-Transfers mit höchstens begrenzter Qualitätsminderung, wenn überhaupt. Rekonfigurationsplanung wird durch Analysekonzepte unterstützt, einschließlich der Simulation formal spezifizierter Systeme und Rekonfigurationen und der Analyse der etwaigen Qualitätsminderung mit dem Evolving Dataflow Graph (EDFG). Die Real-Time Container Architecture wird als Referenzimplementierung und Evaluationsplattform beschrieben. Zwei Fallstudien untersuchen Machbarkeit und Nützlichkeit der Konzepte. Die erste verwendet verschiedene Varianten und Rekonfigurationen eines minimalistischen verteilten Steuerungssystems, um Modell und Prototyp zu vergleichen sowie Laufzeitstatistiken zu erheben. Die zweite Fallstudie ist ein Smart-Factory-Demonstrator, welcher herausforderndere Applikationskomponenten und Schnittstellentechnologien verwendet. Die Konzepte sind den Studien nach machbar und nützlich, auch wenn sowohl die Konzepte als auch der Prototyp noch weitere Arbeit benötigen -- zum Beispiel, um kürzere Zyklen zu erreichen

    GPT Semantic Networking: A Dream of the Semantic Web – The Time is Now

    Get PDF
    The book presents research and practical implementations related to natural language processing (NLP) technologies based on the concept of artificial intelligence, generative AI, and the concept of Complex Networks aimed at creating Semantic Networks. The main principles of NLP, training models on large volumes of text data, new universal and multi-purpose language processing systems are presented. It is shown how the combination of NLP and Semantic Networks technologies opens up new horizons for text analysis, context understanding, the formation of domain models, causal networks, etc. This book presents methods for creating Semantic Networks based on prompt engineering. Practices are presented that will help build semantic networks capable of solving complex problems and making revolutionary changes in the analytical activity. The publication is intended for those who are going to use large language models for the construction and analysis of semantic networks in order to solve applied problems, in particular, in the field of decision making.У книзі представлені дослідження та практичні реалізації технологій обробки природної мови (НЛП), заснованих на концепції штучного інтелект, генеративний ШІ та концепція складних мереж, спрямована на створення семантичних мереж. Представлено основні принципи НЛП, моделі навчання на великих обсягах текстових даних, нові універсальні та багатоцільові системи обробки мови. Показано, як поєднання технологій NLP і семантичних мереж відкриває нові горизонти для аналізу тексту, розуміння контексту, формування моделей домену, причинно-наслідкових мереж тощо. У цій книзі представлені методи створення семантичних мереж на основі оперативного проектування. Представлені практики, які допоможуть побудувати семантичні мережі, здатні вирішувати складні проблеми та вносити революційні зміни в аналітичну діяльність. Видання розраховане на тих, хто збирається використовувати велику мову моделі побудови та аналізу семантичних мереж з метою вирішення прикладних задач, зокрема, у сфері прийняття рішень

    Analysing and Reducing Costs of Deep Learning Compiler Auto-tuning

    Get PDF
    Deep Learning (DL) is significantly impacting many industries, including automotive, retail and medicine, enabling autonomous driving, recommender systems and genomics modelling, amongst other applications. At the same time, demand for complex and fast DL models is continually growing. The most capable models tend to exhibit highest operational costs, primarily due to their large computational resource footprint and inefficient utilisation of computational resources employed by DL systems. In an attempt to tackle these problems, DL compilers and auto-tuners emerged, automating the traditionally manual task of DL model performance optimisation. While auto-tuning improves model inference speed, it is a costly process, which limits its wider adoption within DL deployment pipelines. The high operational costs associated with DL auto-tuning have multiple causes. During operation, DL auto-tuners explore large search spaces consisting of billions of tensor programs, to propose potential candidates that improve DL model inference latency. Subsequently, DL auto-tuners measure candidate performance in isolation on the target-device, which constitutes the majority of auto-tuning compute-time. Suboptimal candidate proposals, combined with their serial measurement in an isolated target-device lead to prolonged optimisation time and reduced resource availability, ultimately reducing cost-efficiency of the process. In this thesis, we investigate the reasons behind prolonged DL auto-tuning and quantify their impact on the optimisation costs, revealing directions for improved DL auto-tuner design. Based on these insights, we propose two complementary systems: Trimmer and DOPpler. Trimmer improves tensor program search efficacy by filtering out poorly performing candidates, and controls end-to-end auto-tuning using cost objectives, monitoring optimisation cost. Simultaneously, DOPpler breaks long-held assumptions about the serial candidate measurements by successfully parallelising them intra-device, with minimal penalty to optimisation quality. Through extensive experimental evaluation of both systems, we demonstrate that they significantly improve cost-efficiency of autotuning (up to 50.5%) across a plethora of tensor operators, DL models, auto-tuners and target-devices

    Analyzing the Unanalyzable: an Application to Android Apps

    Get PDF
    In general, software is unreliable. Its behavior can deviate from users’ expectations because of bugs, vulnerabilities, or even malicious code. Manually vetting software is a challenging, tedious, and highly-costly task that does not scale. To alleviate excessive costs and analysts’ burdens, automated static analysis techniques have been proposed by both the research and practitioner communities making static analysis a central topic in software engineering. In the meantime, mobile apps have considerably grown in importance. Today, most humans carry software in their pockets, with the Android operating system leading the market. Millions of apps have been proposed to the public so far, targeting a wide range of activities such as games, health, banking, GPS, etc. Hence, Android apps collect and manipulate a considerable amount of sensitive information, which puts users’ security and privacy at risk. Consequently, it is paramount to ensure that apps distributed through public channels (e.g., the Google Play) are free from malicious code. Hence, the research and practitioner communities have put much effort into devising new automated techniques to vet Android apps against malicious activities over the last decade. Analyzing Android apps is, however, challenging. On the one hand, the Android framework proposes constructs that can be used to evade dynamic analysis by triggering the malicious code only under certain circumstances, e.g., if the device is not an emulator and is currently connected to power. Hence, dynamic analyses can -easily- be fooled by malicious developers by making some code fragments difficult to reach. On the other hand, static analyses are challenged by Android-specific constructs that limit the coverage of off-the-shell static analyzers. The research community has already addressed some of these constructs, including inter-component communication or lifecycle methods. However, other constructs, such as implicit calls (i.e., when the Android framework asynchronously triggers a method in the app code), make some app code fragments unreachable to the static analyzers, while these fragments are executed when the app is run. Altogether, many apps’ code parts are unanalyzable: they are either not reachable by dynamic analyses or not covered by static analyzers. In this manuscript, we describe our contributions to the research effort from two angles: ① statically detecting malicious code that is difficult to access to dynamic analyzers because they are triggered under specific circumstances; and ② statically analyzing code not accessible to existing static analyzers to improve the comprehensiveness of app analyses. More precisely, in Part I, we first present a replication study of a state-of-the-art static logic bomb detector to better show its limitations. We then introduce a novel hybrid approach for detecting suspicious hidden sensitive operations towards triaging logic bombs. We finally detail the construction of a dataset of Android apps automatically infected with logic bombs. In Part II, we present our work to improve the comprehensiveness of Android apps’ static analysis. More specifically, we first show how we contributed to account for atypical inter-component communication in Android apps. Then, we present a novel approach to unify both the bytecode and native in Android apps to account for the multi-language trend in app development. Finally, we present our work to resolve conditional implicit calls in Android apps to improve static and dynamic analyzers

    Intelligent Software Tooling For Improving Software Development

    Get PDF
    Software has eaten the world with many of the necessities and quality of life services people use requiring software. Therefore, tools that improve the software development experience can have a significant impact on the world such as generating code and test cases, detecting bugs, question and answering, etc. The success of Deep Learning (DL) over the past decade has shown huge advancements in automation across many domains, including Software Development processes. One of the main reasons behind this success is the availability of large datasets such as open-source code available through GitHub or image datasets of mobile Graphical User Interfaces (GUIs) with RICO and ReDRAW to be trained on. Therefore, the central research question my dissertation explores is: In what ways can the software development process be improved through leveraging DL techniques on the vast amounts of unstructured software engineering artifacts? We coin the approaches that leverage DL to automate or augment various software development task as Intelligent Software Tools. To guide our research of these intelligent software tools, we performed a systematic literature review to understand the current landscape of research on applying DL techniques to software tasks and any gaps that exist. From this literature review, we found code generation to be one of the most studied tasks with other tasks and artifacts such as impact analysis or tasks involving images and videos to be understudied. Therefore, we set out to explore the application of DL to these understudied tasks and artifacts as well as the limitations of DL models under the well studied task code completion, a subfield in code generation. Specifically, we developed a tool for automatically detecting duplicate mobile bug reports from user submitted videos. We used the popular Convolutional Neural Network (CNN) to learn important features from a large collection of mobile screenshots. Using this model, we could then compute similarity between a newly submitted bug report and existing ones to produce a ranked list of duplicate candidates that can be reviewed by a developer. Next, we explored impact analysis, a critical software maintenance task that identifies potential adverse effects of a given code change on the larger software system. To this end, we created Athena, a novel approach to impact analysis that integrates knowledge of a software system through its call-graph along with high-level representations of the code inside the system to improve impact analysis performance. Lastly, we explored the task of code completion, which has seen heavy interest from industry and academia. Specifically, we explored various methods that modify the positional encoding scheme of the Transformer architecture for allowing these models to incorporate longer sequences of tokens when predicting completions than seen during their training as this can significantly improve training times

    Methods to improve debug flow for intellectual property protection

    Get PDF
    Abstract. Every company wants to protect their intellectual property and limit customer visibility of confidential information. A company may protect its proprietary information by different ways. This thesis will compare different methods that try to protect intellectual property while maintaining the software debugging capability. Working with binary libraries without debug information makes customer support very difficult. When a company is developing a new product, time to market is important. Usually, the last months are very busy resolving urgent customer issues. Especially during this period, the slow process of debugging customer issues without debug information can cause delays and increase time to market. The goal of this thesis is to compare methods that protects intellectual property by making reverse engineering more difficult. Study of the upcoming GNU Compiler Collection (GCC) features related to debug data formats, such as DWARF5, is also carried out while working with the thesis. The approaches tried were split DWARF, injecting ELF files, stripping debug data, and code obfuscation. Also optimisation and their effect on disassembly was studied. The best solution was to compile the software with debug symbols and strip them to a separate file. This way the symbol data can be loaded separately into GDB. The symbol data layout and addresses are also always correct with the solution.Virheiden etsinnän työnkulun parantaminen immateriaaliomaisuudet huomioiden. Tiivistelmä. Yritykset haluavat suojella immateriaaliomaisuuksiaan ja rajoittaa asiakkaiden näkyvyyttä tietylle tasolle asti. Tämä lopputyö vertailee eri metodeja jotka koittavat suojata immateriaaliomaisuuksia, ilman että ohjelmiston virheidenkorjattavuus kärsii. Binäärikirjastot ilman virheenkorjaustietoja vaikeuttavat asiakkaan tukemista. Uutta tuotetta kehitettäessä, markkinoille tuloaika on yritykselle tärkeää. Yleensä viimeiset kuukaudet ovat kiireisiä asiakkaan ongelmien tutkimuksien kanssa ja kyseiset ongelmat tulisi olla ratkaistuna mahdollisimman nopeasti. Tämän lopputyön tavoitteena on vertailla mahdollisia metodeja, jotka suojaavat immateriaaliomaisuutta takaisinmallinnusta vastaan. Tarkoituksena on myös tutkia tulevia GNU kääntäjä-kokoelman (GCC:n) ominaisuuksia liittyen virheenkorjaustietoformaatteihin, kuten DWARF5. Ongelman ratkaisuun koitettiin pilkottuja virheenkorjaustietoja, ELFtiedoston injektointia, virheenkorjaustiedon riisumista ohjelmistosta ja koodin obfuskointia. Myös optimoinnin vaikutusta konekielestä takaisinmallinnettuun Assembly-muotoon tutkittiin. Paras ratkaisu oli kääntää ohjelmisto virheenkorjaustiedolla ja riisua ne omaan erilliseen tiedostoon. Näin ohjelmiston symbolitieto pystytään latamaan erikseen virheenjäljittemänä käytettyyn GNU Debuggeriin (GDB:hen). Näin symbolitietojen rakenne ja osoitteet ovat myös aina paikkansapitävät

    Analyzable dataflow executions with adaptive redundancy

    Get PDF
    Increasing performance requirements in the embedded systems domain have encouraged a drift from singlecore to multicore processors, and thus multicore processors are widely used in embedded systems today. Cars are an example for complex embedded systems in which the use of multicore processors is continuously increasing. A major reason for this is to consolidate different software components on one chip and thus reduce the number of electronic control units. However, the de facto standard in the automotive industry, AUTOSAR (AUTomotive Open System ARchitecture), was originally designed for singlecore processors. Although basic support for multicore processors was added, more complex architectures are currently not compatible with the software stack. Regarding the software components running on the ECUS of modern cars, requirements are diverse. On the one hand, there are safety-critical tasks, like the airbag control, anti-lock braking system, electronic stability control and emergency brake assist, and on the other hand, tasks which do not have any safety-related requirements at all, for example tasks controlling the infotainment system. Trends like autonomous driving lead to even more demanding tasks in the system since such tasks are both safety-critical and data-intensive. As embedded applications, like those in the automotive domain, become more complex, new approaches are necessary. Data-intensive tasks are usually tackled with large-scale computing frameworks. In this thesis, some major concepts of such frameworks are transferred to the high-performance embedded systems domain. For this purpose, the thesis describes a runtime environment (RTE) that is suitable for different kinds of multi- and manycore hardware architectures. The RTE follows a dataflow execution model based on directed acyclic graphs (DAGs). Graphs are divided into sections which are scheduled separately. For each section, the RTE uses a DAG scheduling heuristic to compute multiple schedules covering different redundancy configurations. This allows the RTE to dynamically change the redundancy of parts of the graph at runtime despite the use of fixed schedules. Alternatively, the RTE also provides an online scheduler. To specify suitable graphs, the RTE also provides a programming model which shares similarities with common large-scale computing frameworks, for example Apache Spark. Using this programming model, three common distributed algorithms, namely Cannon's algorithm, the Cooley-Tukey algorithm and bitonic sort, were implemented. With these three programs, the performance of the RTE was evaluated for a variety of configurations on two different hardware architectures. The results show that the proposed RTE is able to reach the performance of established parallel computation frameworks and that for suitable graphs with reasonable sectionings the negative influence on the runtime is either small or non-existent.Aufgrund steigender Anforderungen an die Leistungsfähigkeit von eingebetteten Systemen finden Mehrkernprozessoren mittlerweile auch in eingebetteten Systemen Verwendung. Autos sind ein Beispiel für eingebettete Systeme, in denen die Verbreitung von Mehrkernprozessoren kontinuierlich zunimmt. Ein Hauptgrund ist, dass es dadurch möglich wird, mehrere Applikationen, für die ursprünglich mehrere Electronic Control Units (ECUs) notwendig waren, auf ein und demselben Chip auszuführen und dadurch die Anzahl der ECUs im Gesamtsystem zu verringern. Der De-facto-Standard AUTOSAR (AUTomotive Open System ARchitecture) wurde jedoch ursprünglich nur im Hinblick auf Einkernprozessoren entworfen und, obwohl der Softwarestack um grundlegende Unterstützung für Mehrkernprozessoren erweitert wurde, sind komplexere Architekturen nicht damit kompatibel. Die Anforderungen der Softwarekomponenten von modernen Autos sind vielfältig. Einerseits gibt es hochgradig sicherheitskritische Tasks, die beispielsweise die Airbags, das Antiblockiersystem, die Fahrdynamikregelung oder den Notbremsassistenten steuern und andererseits Tasks, die keinerlei sicherheitskritische Anforderungen aufweisen, wie zum Beispiel Tasks zur Steuerung des Infotainment-Systems. Neue Trends wie autonomes Fahren führen zu weiteren anspruchsvollen Tasks, die sowohl hohe Leistungs- als auch Sicherheitsanforderungen aufweisen. Da die Komplexität eingebetteter Anwendungen, beispielsweise im Automobilbereich, stetig zunimmt, sind neue Ansätze erforderlich. Für komplexe, datenintensive Aufgaben werden in der Regel Cluster-Computing-Frameworks eingesetzt. In dieser Arbeit werden Konzepte solcher Frameworks auf den Bereich der eingebetteten Systeme übertragen. Dazu beschreibt die Arbeit eine Laufzeitumgebung (RTE) für eingebettete Mehrkernarchitekturen. Die RTE folgt einem Datenfluss-Ausführungsmodell, das auf gerichteten azyklischen Graphen basiert. Graphen können in Abschnitte eingeteilt werden, für welche separat mehrere unterschiedlich redundante Schedules mit Hilfe einer Scheduling-Heuristik berechnet werden. Dieser Ansatz erlaubt es, die Redundanz von Teilen der Anwendung zur Laufzeit zu verändern. Alternativ unterstützt die RTE auch Scheduling zur Laufzeit. Zur Erzeugung von Graphen stellt die RTE ein Programmiermodell bereit, welches sich an etablierten Frameworks, insbesondere Apache Spark, orientiert. Damit wurden drei Beispielanwendungen implementiert, die auf gängigen Algorithmen basieren. Konkret handelt es sich um Cannon's Algorithmus, den Cooley-Tukey-Algorithmus und bitonisches Sortieren. Um die Leistungsfähigkeit der RTE zu ermitteln, wurden diese drei Anwendungen mehrfach mit verschiedenen Konfigurationen auf zwei Hardware-Architekturen ausgeführt. Die Ergebnisse zeigen, dass die RTE in ihrer Leistungsfähigkeit mit etablierten Systemen vergleichbar ist und die Laufzeit bei einer sinnvollen Graphaufteilung im besten Fall nur geringfügig beeinflusst wird
    corecore