794 research outputs found

    Vectorization system for unstructured codes with a Data-parallel Compiler IR

    Get PDF
    With Dennard Scaling coming to an end, Single Instruction Multiple Data (SIMD) offers itself as a way to improve the compute throughput of CPUs. One fundamental technique in SIMD code generators is the vectorization of data-parallel code regions. This has applications in outer-loop vectorization, whole-function vectorization and vectorization of explicitly data-parallel languages. This thesis makes contributions to the reliable vectorization of data-parallel code regions with unstructured, reducible control flow. Reducibility is the case in practice where all control-flow loops have exactly one entry point. We present P-LLVM, a novel, full-featured, intermediate representation for vectorizers that provides a semantics for the code region at every stage of the vectorization pipeline. Partial control-flow linearization is a novel partial if-conversion scheme, an essential technique to vectorize divergent control flow. Different to prior techniques, partial linearization has linear running time, does not insert additional branches or blocks and gives proved guarantees on the control flow retained. Divergence of control induces value divergence at join points in the control-flow graph (CFG). We present a novel control-divergence analysis for directed acyclic graphs with optimal running time and prove that it is correct and precise under common static assumptions. We extend this technique to obtain a quadratic-time, control-divergence analysis for arbitrary reducible CFGs. For this analysis, we show on a range of realistic examples how earlier approaches are either less precise or incorrect. We present a feature-complete divergence analysis for P-LLVM programs. The analysis is the first to analyze stack-allocated objects in an unstructured control setting. Finally, we generalize single-dimensional vectorization of outer loops to multi-dimensional tensorization of loop nests. SIMD targets benefit from tensorization through more opportunities for re-use of loaded values and more efficient memory access behavior. The techniques were implemented in the Region Vectorizer (RV) for vectorization and TensorRV for loop-nest tensorization. Our evaluation validates that the general-purpose RV vectorization system matches the performance of more specialized approaches. RV performs on par with the ISPC compiler, which only supports its structured domain-specific language, on a range of tree traversal codes with complex control flow. RV is able to outperform the loop vectorizers of state-of-the-art compilers, as we show for the SPEC2017 nab_s benchmark and the XSBench proxy application.Mit dem Ausreizen des Dennard Scalings erreichen die gewohnten Zuwächse in der skalaren Rechenleistung zusehends ihr Ende. Moderne Prozessoren setzen verstärkt auf parallele Berechnung, um den Rechendurchsatz zu erhöhen. Hierbei spielen SIMD Instruktionen (Single Instruction Multiple Data), die eine Operation gleichzeitig auf mehrere Eingaben anwenden, eine zentrale Rolle. Eine fundamentale Technik, um SIMD Programmcode zu erzeugen, ist der Einsatz datenparalleler Vektorisierung. Diese unterliegt populären Verfahren, wie der Vektorisierung äußerer Schleifen, der Vektorisierung gesamter Funktionen bis hin zu explizit datenparallelen Programmiersprachen. Der Beitrag der vorliegenden Arbeit besteht darin, ein zuverlässiges Vektorisierungssystem für datenparallelen Code mit reduziblem Steuerfluss zu entwickeln. Diese Anforderung ist für alle Steuerflussgraphen erfüllt, deren Schleifen nur einen Eingang haben, was in der Praxis der Fall ist. Wir präsentieren P-LLVM, eine ausdrucksstarke Zwischendarstellung für Vektorisierer, welche dem Programm in jedem Stadium der Transformation von datenparallelem Code zu SIMD Code eine definierte Semantik verleiht. Partielle Steuerfluss-Linearisierung ist ein neuer Algorithmus zur If-Conversion, welcher Sprünge erhalten kann. Anders als existierende Verfahren hat Partielle Linearisierung eine lineare Laufzeit und fügt keine neuen Sprünge oder Blöcke ein. Wir zeigen Kriterien, unter denen der Algorithmus Steuerfluss erhält, und beweisen diese. Steuerflussdivergenz induziert Divergenz an Punkten zusammenfließenden Steuerflusses. Wir stellen eine neue Steuerflussdivergenzanalyse für azyklische Graphen mit optimaler Laufzeit vor und beweisen deren Korrektheit und Präzision. Wir verallgemeinern die Technik zu einem Algorithmus mit quadratischer Laufzeit für beliebiege, reduzible Steuerflussgraphen. Eine Studie auf realistischen Beispielgraphen zeigt, dass vergleichbare Techniken entweder weniger präsize sind oder falsche Ergebnisse liefern. Ebenfalls präsentieren wir eine Divergenzanalyse für P-LLVM Programme. Diese Analyse ist die erste Divergenzanalyse, welche Divergenz in stapelallokierten Objekten unter unstrukturiertem Steuerfluss analysiert. Schließlich generalisieren wir die eindimensionale Vektorisierung von äußeren Schleifen zur multidimensionalen Tensorisierung von Schleifennestern. Tensorisierung eröffnet für SIMD Prozessoren mehr Möglichkeiten, bereits geladene Werte wiederzuverwenden und das Speicherzugriffsverhalten des Programms zu optimieren, als dies mit Vektorisierung der Fall ist. Die vorgestellten Techniken wurden in den Region Vectorizer (RV) für Vektorisierung und TensorRV für die Tensorisierung von Schleifennestern implementiert. Wir zeigen auf einer Reihe von steuerflusslastigen Programmen für die Traversierung von Baumdatenstrukturen, dass RV das gleiche Niveau erreicht wie der ISPC Compiler, welcher nur seine strukturierte Eingabesprache verarbeiten kann. RV kann schnellere SIMD-Programme erzeugen als die Schleifenvektorisierer in aktuellen Industriecompilern. Dies demonstrieren wir mit dem nab_s benchmark aus der SPEC2017 Benchmarksuite und der XSBench Proxy-Anwendung

    국면에 의한 주격 부여: 서술기반 접근

    Get PDF
    학위논문 (석사) -- 서울대학교 대학원 : 인문대학 언어학과, 2021. 2. 고희정.본 논문의 목적은 두 가지이다. 하나는 격 부여를 설명하는 현재의 모델들이 문제점을 가지고 있음을 지적하는 것이고, 다른 하나는 서술(predication)이 격 부여의 주기적 영역(cyclic domain)으로 작용하는 새로운 격 부여의 방안을 제안하는 것이다. 생성문법에서 격은 명사의 분포를 설명하는 이론으로서 지배결속 이론에서부터 주요한 통사론의 주제로 연구되어 왔다. 특히 구조격은 주어진 영역 내에서 명사들이 지니는 구조적 관계를 표시하며 (Blake 2001), 이러한 구조격의 부여와 관련하여 최근 최소주의 문법에서는 일치 모델(Agree model)과 의존격 모델(Dependent Case model)이라는 두 모델이 대두되고 있다. 두 모델은 구조격이 부여되는 방법에 대해 크게 다른 설명을 내놓는데, 일치 모델에서는 구조격이 다른 기능핵과의 관계를 통해 부여된다고 설명하는 반면 의존격 모델은 구조격이 명사간의 구조적 관계에 의해 부여된다고 설명한다. 일치 모델(Chomsky 2000, 2001)에서는 격을 부여하는 기능핵 F0의 정체가 중요한 연구 대상 중 하나이다. 기존 연구에서는 이 기능핵에 대해 그 정체가 TFIN (Chomsky 2000, 2001), Agr (Raposo 1987, Kornfilt 2003) 혹은 C (Tanaka 2005, Fakih 2016)라는 주장이 있어왔으며, Chomsky (2008)에서는 격의 부여를 담당하는 자질이 C 혹은 v*로 정의되는 국면핵에 의해 들여온다고 주장한 바 있다. 반면 의존격 모델 (Yip et al. 1987, Marantz 1991)의 경우 명사구 간의 구조적 관계가 비교되는 영역의 정의에 따라서 두 명사구 사이의 관계가 달라질 수 있기 때문에 격이 어떤 영역에서 부여되는지가 중요한 주제로 연구되어 왔다. 이에 대해 초기 의존격 모델(Marantz 1991)에서는 V+T 복합 영역을 격 부여의 영역으로 제시한 반면, 최근 연구(Baker 2015; Levin and Preminger 2015; Levin 2017)에서는 국면(phase)을 이러한 격 부여의 영역이라고 주장한다. 본 연구는 이러한 두 모델이 모두 문제점을 안고 있음을 주장한다. 우선 일치 모델에 대해서는 격 부여를 특정 기능핵에 의존하는 현 모델이 범언어적 데이터를 수용할 수 없음을 제시하고, 의존격 모델에 대해서는 현 모델이 주격의 분포를 제대로 설명할 수 없음을 지적한다. 이를 위해 본고는 터키어, 루마니아어, 한국어 등의 언어들의 예문을 살펴보고 특히 모델들이 가지는 문제점들을 효과적으로 제시하기 위해 한국어 통사사동구문과 다중주격구문을 핵심적인 자료로 제시한다. 한국어 통사사동구문에서는 CP보다 작은 크기의, 시제소나 일치소가 부재하는 절에서 주격이 부여되는 상황이 관찰된다. 이는 일치 모델을 따르는 기존 선행연구들에서 다양하게 제시된 여러 종류의 격 부여 기능핵 중 그 어느 핵으로도 설명되지 않는다. 더욱이 이 구문에서 관찰되는 격중출 현상은 해당 구문에서 나타나는 주격이 무표격(default case)이 아니며 격 인가자가 필요한 구조격임을 시사하기 때문에 일치 모델에 큰 문제를 야기한다. 한편 한국어 다중주격구문에서는 성분통어 관계에 있는 여러 주어가 주격으로 나타나는데, 이러한 관계에서 의존격이 부여되지 않고 무표격인 주격이 발현되는 현상은 현 의존격 모델이 그 격 부여의 기제나 격 부여의 영역을 수정해야 함을 시사한다. 이러한 문제제기를 바탕으로 본고는 두 모델에서 말하듯 국면이 격 부여 현상과 관련이 되어있다는 것이 사실이라면, 문제되는 자료들을 설명하기 위해 국면이 새롭게 정의되어야 한다고 주장한다. 그리고 서술으로 정의되는 국면이 격 인가의 영역으로 작용하고, 해당 주술관계의 주어가 주격을 부여받는 새로운 격 부여 모델을 제안한다. 본고에서 제시되는 모델은 기존 연구에서 포착되어온 격의 인가와 국면, 주술관계가 가지는 연관성을 포착할 수 있으며, 문제로 제시된 한국어의 사동구문과 다중주격구문을 성공적으로 설명할 수 있다. 또한 격의 인허를 서술으로서의 국면의 역할으로 상정하는 이러한 분석은 기존에 국면의 속성으로 주어졌던 선형화, 생략현상 등의 다양한 현상들을 격의 인허와 연결하여 해석할 수 있는 새로운 방안을 개척한다.The goal of this paper is twofold: one is to point out the issues of the current models of case assignment in generative grammar; the other is to propose a novel mechanism of structural case assignment by arguing for predication as the cyclic domain of case assignment. Structural case marks the structural relationships of nominals in a given domain (Blake 2001). On the topic of its assignment, two main models are in competition: the Agree model and the Dependent Case model. The two models differ crucially in their view of what the relationship encoded by structural case is: the Agree model asserts that structural case is assigned according to its relationship to other functional heads, while the Dependent Case model argues that a nominal is assigned its case according to its structural relationship to other nominals. For the Agree model (Chomsky 2000, 2001), the identity of the case assigning functional head F0 is a crucial topic of investigation. Previous studies have proposed TFIN (Chomsky 2000, 2001) or Agr (Raposo 1987, Kornfilt 2003) as the assigner of nominative case. Others yet proposed that C or C in addition to T (Tanaka 2005, Fakih 2016) is responsible for case assignment. Chomsky (2008) especially proposed that C/v* as the phase head introduces all uninterpretable features, among which are features responsible for case assignment. For the Dependent Case model (Yip et al. 1987, Marantz 1991), the relative positions of nominals in a local domain are critical in calculating the assignment of case. How this domain is defined thus becomes an important topic as different domain boundaries can change whether a nominal is c-commanded or not in the domain. An earlier version (Marantz 1991) proposed the V+T complex as the case assigning domain, while recent works (Baker 2015; Levin and Preminger 2015; Levin 2017) propose phase as the local domain. In this paper I present theoretical challenges to both models of case assignment. I demonstrate that the Agree model, which relies on a specific functional head for the assignment of case, is cross-linguistically untenable. As for the Dependent Case model, I show that the model cannot properly predict the distribution of nominative case assignment. Data from Turkish, Romanian, Korean and other languages are used to demonstrate these challenges. Two constructions of Korean are used to argue for the key points of my proposal, namely periphrastic causative construction (PCC) and multiple nominative construction (MNC). PCC data illustrates an instance where nominative case is assigned in a nonfinite clause smaller than a CP. The various syntactic categories proposed to be the case assigning F0 in previous research adopting the Agree model cannot adequately account for the assignment of this nominative case. Moreover, case-stacking data shows that the nominative case in PCC cannot be a default case and thus needs to be licensed by a case assigner, providing a non-trivial challenge to the Agree model. MNC data, where multiple nominative marked subjects in a c-command relationship appear in a local domain, also poses challenges to the Dependent Case model, as non-assignment of dependent case in the situation calls for either a rewrite of how case is assigned in the model, or reexamination of the case assigning domain. Based on these observations I argue that if phase is involved in case assignment as the two models assume, the definition of phase should be revised in order to account for the presented data. I propose a new theory of case assignment where phase, defined as predication, operates as the cyclic domain of nominative case assignment. The model proposed is able to capture the previously observed correlation between case assignment, phase and predication, as well as successfully account for the puzzling data of Korean PCC and MNC data. Ascribing case assignment to phase can also open up new avenues for analyzing case assignment in connection with other phenomena in syntax that have been argued to be related to the characteristics of phase.1 Introduction 1 2 Background 4 3 Theoretical Challenges 7 3.1 Challenges to the Agree model 7 3.2 Challenges to the Dependent Case model 15 3.3 Summary 21 4 Proposal 22 4.1 Theoretical Assumptions 22 4.2 Predication as Case Assigning Domain 23 4.3 Main Proposal 25 4.4 Summary 27 5 Analysis 29 5.1 Korean Periphrastic Causative Constructions 29 5.2 Multiple Nominative Constructions 31 6 Discussion 34 6.1 Cyclic Linearization 34 6.2 Argument Ellipsis 36 7 Remaining Issues 40 7.1 Accusative 40 7.2 Typology 41 8 Conclusion 45 References 47Maste

    Symbolic crosschecking of data-parallel floating-point code

    Get PDF

    The Point of Language in Heidegger’s Thinking

    Get PDF

    Re-Imagining Text — Re-Imagining Hermeneutics

    Get PDF
    With the advent of the digital age and new mediums of communication, it is becoming increasingly important for those interested in the interpretation of religious text to look beyond traditional ideas of text and textuality to find the sacred in unlikely places. Paul Ricoeur’s phenomenological reorientation of classical hermeneutics from romanticized notions of authorial intent and psychological divinations to a serious engagement with the “science of the text” is a hermeneutical tool that opens up an important dialogue between the interpreter, the world of the text, and the contemporary world in front of the text. This article examines three significant insights that Paul Ricoeur contributes to our expanding understanding of text. First under scrutiny will be Ricoeur’s de-regionalization of classic hermeneutics culminating in his understanding of Dasein (Being) as “being-in-the-world,” allowing mean-ing to transcend the physical boundaries of the text. Next, Ricoeur’s three-fold under-standing of traditionality/Traditions/tradition as the “chain of interpretations” through which religious language transcends the tem-poral boundary of historicity will be explored. The final section will focus on Ricoeur’s understanding of the productive imagination and metaphoric truth as the under-appreciated yet key insight around which Ricoeur’s philosophical investigation into the metaphoric transfer from text to life revolves

    An FPGA implementation of an investigative many-core processor, Fynbos : in support of a Fortran autoparallelising software pipeline

    Get PDF
    Includes bibliographical references.In light of the power, memory, ILP, and utilisation walls facing the computing industry, this work examines the hypothetical many-core approach to finding greater compute performance and efficiency. In order to achieve greater efficiency in an environment in which Moore’s law continues but TDP has been capped, a means of deriving performance from dark and dim silicon is needed. The many-core hypothesis is one approach to exploiting these available transistors efficiently. As understood in this work, it involves trading in hardware control complexity for hundreds to thousands of parallel simple processing elements, and operating at a clock speed sufficiently low as to allow the efficiency gains of near threshold voltage operation. Performance is there- fore dependant on exploiting a new degree of fine-grained parallelism such as is currently only found in GPGPUs, but in a manner that is not as restrictive in application domain range. While removing the complex control hardware of traditional CPUs provides space for more arithmetic hardware, a basic level of control is still required. For a number of reasons this work chooses to replace this control largely with static scheduling. This pushes the burden of control primarily to the software and specifically the compiler, rather not to the programmer or to an application specific means of control simplification. An existing legacy tool chain capable of autoparallelising sequential Fortran code to the degree of parallelism necessary for many-core exists. This work implements a many-core architecture to match it. Prototyping the design on an FPGA, it is possible to examine the real world performance of the compiler-architecture system to a greater degree than simulation only would allow. Comparing theoretical peak performance and real performance in a case study application, the system is found to be more efficient than any other reviewed, but to also significantly under perform relative to current competing architectures. This failing is apportioned to taking the need for simple hardware too far, and an inability to implement static scheduling mitigating tactics due to lack of support for such in the compiler

    Χρήση μοντέλου παράλληλου προγραμματισμού για σύνθεση αρχιτεκτονικών

    Get PDF
    The problem of automatically generating hardware modules from high level application representations has been at the forefront of EDA research during the last few years. In this Dissertation we introduce a methodology to automatically synthesize hardware accelerators from OpenCL applications. OpenCL is a recent industry supported standard for writing programs that execute on multicore platforms and accelerators such as GPUs. Our methodology maps OpenCL kernels into hardware accelerators based on architectural templates that explicitly decouple computation from memory communication whenever this is possible. The templates can be tuned to provide a wide repertoire of accelerators that meet user performance requirements and FPGA device characteristics. Furthermore a set of high- and low-level compiler optimizations is applied to generate optimized accelerators. Our experimental evaluation shows that the generated accelerators are tuned efficiently to match the applications memory access pattern and computational complexity and to achieve user performance requirements. An important objective of our tool is to expand the FPGA development user base to software engineers thereby expanding the scope of FPGAs beyond the realm of hardware design.To πρόβλημα της αυτόματης δημιουργίας μονάδων υλικό από παραστάσεις υψηλού επιπέδου εφαρμογής είναι στην πρώτη γραμμή της EDA έρευνας κατά τη διάρκεια των τελευταίων ετών. Σε αυτή την διατριβή παρουσιάζουμε μια μεθοδολογία για τη αυτόματη σύνθεση επιταχυντές υλικού από εφαρμογές OpenCL. OpenCL είναι ένα πρόσφατο πρότυπο για τη σύνταξη των προγραμμάτων που εκτελούνται σε πλατφόρμες πολλαπλών πυρήνων και επιταχυντές όπως GPUs. Η μεθοδολογία μας μετατρέπει προγράμματα OpenCL σε επιταχυντές υλικού με βάση αρχιτεκτονικά πρότυπα που ρητά αποσυνδέει τους υπολογισμούς από την μεταφορά δεδομένων από/προς την μνήμη όποτε αυτό είναι δυνατό. Τα πρότυπα μπορούν να συντονιστούν ώστε να παρέχουν ένα ευρύ ρεπερτόριο από επιταχυντές που πληρούν τις απαιτήσεις απόδοσης των χρηστών και τα χαρακτηριστικά της συσκευής FPGA. Επιπλέον ένα σύνολο υψηλής και χαμηλής στάθμης βελτιστοποιήσεις μεταγλωττιστή εφαρμόζεται για να παράγει βελτιστοποιημένα επιταχυντές. Η πειραματική αξιολόγηση δείχνει ότι οι επιταχυντές που δημιουργούνται αποτελεσματικά συντονισμένοι για να ταιριάζει με το μοτίβο πρόσβασης στην μνήμη κάθε εφαρμογής και την υπολογιστική πολυπλοκότητα και να επιτύχουν τις απαιτήσεις απόδοσης των χρηστών. Ένας σημαντικός στόχος του εργαλείου μας είναι η επέκταση της βάσης χρηστών πλατφόρμες FPGA για μηχανικούς λογισμικού ώστε να γίνει ανάπτυξη FPGA συστήματα από μηχανικούς λογισμικού χωρίς την ανάγκη για εμπειρία σχεδιασμού υλικού

    Fundamentos ontológicos da filosofia da ciência de Aristóteles

    Get PDF
    Orientador: Lucas AngioniTese (doutorado) - Universidade Estadual de Campinas, Instituto de Filosofia e Ciências HumanasResumo: Esta tese dedica-se aos fundamentos ontológicos da filosofia da ciência de Aristóteles. Sua noção de conhecimento científico compromete-se com certo tipo de fundacionismo, que reconhece essências como fatores explanatórios últimos. O filósofo distingue dois tipos de portadores de essência: sujeitos e atributos. Nossa análise dessa distinção envolve um estudo da doutrina das categorias e da teoria da predicação de Aristóteles. Ademais, procuramos especificar os papéis desempenhados pelas essências dos sujeitos e pelas essências dos atributos em explicações científicas. Como resultado, temos que o fundacionismo de Aristóteles consiste na visão de que a realidade é composta de cadeias explanatórias finitas e entes cujas essências estão conectadas umas às outras em uma estrutura hierárquicaAbstract: This dissertation focuses on the ontological underpinnings of Aristotle¿s philosophy of science. His notion of scientific knowledge is committed to a certain kind of foundationalism, which recognizes essences as ultimate explanatory factors. The philosopher distinguishes between two kinds of essence-bearers: subjects and attributes. Our analysis of this distinction involves a study of Aristotle¿s doctrine of ontological categories and his theory of predication. In addition, we specify the roles played by the essences of subjects and the essences of attributes in scientific explanations. As a result, Aristotle¿s foundationalism amounts to the view that reality is composed of finite chains of explanatory connections and entities whose essences are connected to one another in a hierarchical structureDoutoradoFilosofiaDoutor em Filosofia2013/26386-2141403/2014-4FAPESPCNP
    corecore