64 research outputs found

    Vectorization system for unstructured codes with a Data-parallel Compiler IR

    Get PDF
    With Dennard Scaling coming to an end, Single Instruction Multiple Data (SIMD) offers itself as a way to improve the compute throughput of CPUs. One fundamental technique in SIMD code generators is the vectorization of data-parallel code regions. This has applications in outer-loop vectorization, whole-function vectorization and vectorization of explicitly data-parallel languages. This thesis makes contributions to the reliable vectorization of data-parallel code regions with unstructured, reducible control flow. Reducibility is the case in practice where all control-flow loops have exactly one entry point. We present P-LLVM, a novel, full-featured, intermediate representation for vectorizers that provides a semantics for the code region at every stage of the vectorization pipeline. Partial control-flow linearization is a novel partial if-conversion scheme, an essential technique to vectorize divergent control flow. Different to prior techniques, partial linearization has linear running time, does not insert additional branches or blocks and gives proved guarantees on the control flow retained. Divergence of control induces value divergence at join points in the control-flow graph (CFG). We present a novel control-divergence analysis for directed acyclic graphs with optimal running time and prove that it is correct and precise under common static assumptions. We extend this technique to obtain a quadratic-time, control-divergence analysis for arbitrary reducible CFGs. For this analysis, we show on a range of realistic examples how earlier approaches are either less precise or incorrect. We present a feature-complete divergence analysis for P-LLVM programs. The analysis is the first to analyze stack-allocated objects in an unstructured control setting. Finally, we generalize single-dimensional vectorization of outer loops to multi-dimensional tensorization of loop nests. SIMD targets benefit from tensorization through more opportunities for re-use of loaded values and more efficient memory access behavior. The techniques were implemented in the Region Vectorizer (RV) for vectorization and TensorRV for loop-nest tensorization. Our evaluation validates that the general-purpose RV vectorization system matches the performance of more specialized approaches. RV performs on par with the ISPC compiler, which only supports its structured domain-specific language, on a range of tree traversal codes with complex control flow. RV is able to outperform the loop vectorizers of state-of-the-art compilers, as we show for the SPEC2017 nab_s benchmark and the XSBench proxy application.Mit dem Ausreizen des Dennard Scalings erreichen die gewohnten Zuwächse in der skalaren Rechenleistung zusehends ihr Ende. Moderne Prozessoren setzen verstärkt auf parallele Berechnung, um den Rechendurchsatz zu erhöhen. Hierbei spielen SIMD Instruktionen (Single Instruction Multiple Data), die eine Operation gleichzeitig auf mehrere Eingaben anwenden, eine zentrale Rolle. Eine fundamentale Technik, um SIMD Programmcode zu erzeugen, ist der Einsatz datenparalleler Vektorisierung. Diese unterliegt populären Verfahren, wie der Vektorisierung äußerer Schleifen, der Vektorisierung gesamter Funktionen bis hin zu explizit datenparallelen Programmiersprachen. Der Beitrag der vorliegenden Arbeit besteht darin, ein zuverlässiges Vektorisierungssystem für datenparallelen Code mit reduziblem Steuerfluss zu entwickeln. Diese Anforderung ist für alle Steuerflussgraphen erfüllt, deren Schleifen nur einen Eingang haben, was in der Praxis der Fall ist. Wir präsentieren P-LLVM, eine ausdrucksstarke Zwischendarstellung für Vektorisierer, welche dem Programm in jedem Stadium der Transformation von datenparallelem Code zu SIMD Code eine definierte Semantik verleiht. Partielle Steuerfluss-Linearisierung ist ein neuer Algorithmus zur If-Conversion, welcher Sprünge erhalten kann. Anders als existierende Verfahren hat Partielle Linearisierung eine lineare Laufzeit und fügt keine neuen Sprünge oder Blöcke ein. Wir zeigen Kriterien, unter denen der Algorithmus Steuerfluss erhält, und beweisen diese. Steuerflussdivergenz induziert Divergenz an Punkten zusammenfließenden Steuerflusses. Wir stellen eine neue Steuerflussdivergenzanalyse für azyklische Graphen mit optimaler Laufzeit vor und beweisen deren Korrektheit und Präzision. Wir verallgemeinern die Technik zu einem Algorithmus mit quadratischer Laufzeit für beliebiege, reduzible Steuerflussgraphen. Eine Studie auf realistischen Beispielgraphen zeigt, dass vergleichbare Techniken entweder weniger präsize sind oder falsche Ergebnisse liefern. Ebenfalls präsentieren wir eine Divergenzanalyse für P-LLVM Programme. Diese Analyse ist die erste Divergenzanalyse, welche Divergenz in stapelallokierten Objekten unter unstrukturiertem Steuerfluss analysiert. Schließlich generalisieren wir die eindimensionale Vektorisierung von äußeren Schleifen zur multidimensionalen Tensorisierung von Schleifennestern. Tensorisierung eröffnet für SIMD Prozessoren mehr Möglichkeiten, bereits geladene Werte wiederzuverwenden und das Speicherzugriffsverhalten des Programms zu optimieren, als dies mit Vektorisierung der Fall ist. Die vorgestellten Techniken wurden in den Region Vectorizer (RV) für Vektorisierung und TensorRV für die Tensorisierung von Schleifennestern implementiert. Wir zeigen auf einer Reihe von steuerflusslastigen Programmen für die Traversierung von Baumdatenstrukturen, dass RV das gleiche Niveau erreicht wie der ISPC Compiler, welcher nur seine strukturierte Eingabesprache verarbeiten kann. RV kann schnellere SIMD-Programme erzeugen als die Schleifenvektorisierer in aktuellen Industriecompilern. Dies demonstrieren wir mit dem nab_s benchmark aus der SPEC2017 Benchmarksuite und der XSBench Proxy-Anwendung

    Language Support for Programming High-Performance Code

    Get PDF
    Nowadays, the computing landscape is becoming increasingly heterogeneous and this trend is currently showing no signs of turning around. In particular, hardware becomes more and more specialized and exhibits different forms of parallelism. For performance-critical codes it is indispensable to address hardware-specific peculiarities. Because of the halting problem, however, it is unrealistic to assume that a program implemented in a general-purpose programming language can be fully automatically compiled to such specialized hardware while still delivering peak performance. One form of parallelism is single instruction, multiple data (SIMD). Part I of this thesis presents Sierra: an extension for C ++ that facilitates portable and effective SIMD programming. Part II discusses AnyDSL. This framework allows to embed a so-called domain-specific language (DSL) into a host language. On the one hand, a DSL offers the application developer a convenient interface; on the other hand, a DSL can perform domain-specific optimizations and effectively map DSL constructs to various architectures. In order to implement a DSL, one usually has to write or modify a compiler. With AnyDSL though, the DSL constructs are directly implemented in the host language while a partial evaluator removes any abstractions that are required in the implementation of the DSL.Die Rechnerlandschaft wird heutzutage immer heterogener und derzeit ist keine Trendwende in Sicht. Insbesondere wird die Hardware immer spezialisierter und weist verschiedene Formen der Parallelität auf. Für performante Programme ist es unabdingbar, hardwarespezifische Eigenheiten zu adressieren. Wegen des Halteproblems ist es allerdings unrealistisch anzunehmen, dass ein Programm, das in einer universell einsetzbaren Programmiersprache implementiert ist, vollautomatisch auf solche spezialisierte Hardware übersetzt werden kann und dabei noch Spitzenleistung erzielt. Eine Form der Parallelität ist „single instruction, multiple data (SIMD)“. Teil I dieser Arbeit stellt Sierra vor: eine Erweiterung für C++, die portable und effektive SIMD-Programmierung unterstützt. Teil II behandelt AnyDSL. Dieses Rahmenwerk ermöglicht es, eine sogenannte domänenspezifische Sprache (DSL) in eine Gastsprache einzubetten. Auf der einen Seite bietet eine DSL dem Anwendungsentwickler eine komfortable Schnittstelle; auf der anderen Seiten kann eine DSL domänenspezifische Optimierungen durchführen und DSL-Konstrukte effektiv auf verschiedene Architekturen abbilden. Um eine DSL zu implementieren, muss man gewöhnlich einen Compiler schreiben oder modifizieren. In AnyDSL werden die DSL-Konstrukte jedoch direkt in der Gastsprache implementiert und ein partieller Auswerter entfernt jegliche Abstraktionen, die in der Implementierung der DSL benötigt werden

    Contributions to High-Throughput Computing Based on the Peer-to-Peer Paradigm

    Get PDF
    XII, 116 p.This dissertation focuses on High Throughput Computing (HTC) systems and how to build a working HTC system using Peer-to-Peer (P2P) technologies. The traditional HTC systems, designed to process the largest possible number of tasks per unit of time, revolve around a central node that implements a queue used to store and manage submitted tasks. This central node limits the scalability and fault tolerance of the HTC system. A usual solution involves the utilization of replicas of the master node that can replace it. This solution is, however, limited by the number of replicas used. In this thesis, we propose an alternative solution that follows the P2P philosophy: a completely distributed system in which all worker nodes participate in the scheduling tasks, and with a physically distributed task queue implemented on top of a P2P storage system. The fault tolerance and scalability of this proposal is, therefore, limited only by the number of nodes in the system. The proper operation and scalability of our proposal have been validated through experimentation with a real system. The data availability provided by Cassandra, the P2P data management framework used in our proposal, is analysed by means of several stochastic models. These models can be used to make predictions about the availability of any Cassandra deployment, as well as to select the best possible con guration of any Cassandra system. In order to validate the proposed models, an experimentation with real Cassandra clusters is made, showing that our models are good descriptors of Cassandra's availability. Finally, we propose a set of scheduling policies that try to solve a common problem of HTC systems: re-execution of tasks due to a failure in the node where the task was running, without additional resource misspending. In order to reduce the number of re-executions, our proposals try to nd good ts between the reliability of nodes and the estimated length of each task. An extensive simulation-based experimentation shows that our policies are capable of reducing the number of re-executions, improving system performance and utilization of nodes

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    History of Construction Cultures Volume 1

    Get PDF
    History of Construction Cultures Volume 1 contains papers presented at the 7ICCH – Seventh International Congress on Construction History, held at the Lisbon School of Architecture, Portugal, from 12 to 16 July, 2021. The conference has been organized by the Lisbon School of Architecture (FAUL), NOVA School of Social Sciences and Humanities, the Portuguese Society for Construction History Studies and the University of the Azores. The contributions cover the wide interdisciplinary spectrum of Construction History and consist on the most recent advances in theory and practical case studies analysis, following themes such as: - epistemological issues; - building actors; - building materials; - building machines, tools and equipment; - construction processes; - building services and techniques ; -structural theory and analysis ; - political, social and economic aspects; - knowledge transfer and cultural translation of construction cultures. Furthermore, papers presented at thematic sessions aim at covering important problematics, historical periods and different regions of the globe, opening new directions for Construction History research. We are what we build and how we build; thus, the study of Construction History is now more than ever at the centre of current debates as to the shape of a sustainable future for humankind. Therefore, History of Construction Cultures is a critical and indispensable work to expand our understanding of the ways in which everyday building activities have been perceived and experienced in different cultures, from ancient times to our century and all over the world

    DIAS Research Report 2011

    Get PDF

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Semantic Systems. The Power of AI and Knowledge Graphs

    Get PDF
    This open access book constitutes the refereed proceedings of the 15th International Conference on Semantic Systems, SEMANTiCS 2019, held in Karlsruhe, Germany, in September 2019. The 20 full papers and 8 short papers presented in this volume were carefully reviewed and selected from 88 submissions. They cover topics such as: web semantics and linked (open) data; machine learning and deep learning techniques; semantic information management and knowledge integration; terminology, thesaurus and ontology management; data mining and knowledge discovery; semantics in blockchain and distributed ledger technologies
    corecore