116 research outputs found

    Towards Performance Portable Graph Algorithms

    Get PDF
    In today's data-driven world, our computational resources have become heterogeneous, making the processing of large-scale graphs in an architecture agnostic manner crucial. Traditionally, hand-optimized high-performance computing (HPC) solutions have been studied and used to implement highly efficient and scalable graph algorithms. In recent years, several graph processing and management systems have also been proposed. Hand optimized HPC approaches require high levels of expertise and graph processing frameworks suffer from expressibility and performance. Portability is a major concern for both approaches. The main thesis of this work is that block-based graph algorithms offer a compromise between efficient parallelism and architecture agnostic algorithm design for a wide class of graph problems. This dissertation seeks to prove this thesis by focusing the work on the three pillars; data/computation partitioning, block-based algorithm design, and performance portability. In this dissertation, we first show how we can partition the computation and the data to design efficient block-based algorithms for solving graph merging and triangle counting problems. Then, generalizing from our experiences, we propose an algorithmic framework, for shared-memory, heterogeneous machines for implementing block-based graph algorithms; PGAbB. PGAbB aims to maximally leverage different architectures by implementing a task-based execution on top of a block-based programming model. In this talk we will discuss PGAbB's programming model, algorithmic optimizations for scheduling, and load-balancing strategies for graph problems on real-world and synthetic inputs.Ph.D

    Doctor of Philosophy

    Get PDF
    dissertationAbstraction plays an important role in digital design, analysis, and verification, as it allows for the refinement of functions through different levels of conceptualization. This dissertation introduces a new method to compute a symbolic, canonical, word-level abstraction of the function implemented by a combinational logic circuit. This abstraction provides a representation of the function as a polynomial Z = F(A) over the Galois field F2k , expressed over the k-bit input to the circuit, A. This representation is easily utilized for formal verification (equivalence checking) of combinational circuits. The approach to abstraction is based upon concepts from commutative algebra and algebraic geometry, notably the Grobner basis theory. It is shown that the polynomial F(A) can be derived by computing a Grobner basis of the polynomials corresponding to the circuit, using a specific elimination term order based on the circuits topology. However, computing Grobner bases using elimination term orders is infeasible for large circuits. To overcome these limitations, this work introduces an efficient symbolic computation to derive the word-level polynomial. The presented algorithms exploit i) the structure of the circuit, ii) the properties of Grobner bases, iii) characteristics of Galois fields F2k , and iv) modern algorithms from symbolic computation. A custom abstraction tool is designed to efficiently implement the abstraction procedure. While the concept is applicable to any arbitrary combinational logic circuit, it is particularly powerful in verification and equivalence checking of hierarchical, custom designed and structurally dissimilar Galois field arithmetic circuits. In most applications, the field size and the datapath size k in the circuits is very large, up to 1024 bits. The proposed abstraction procedure can exploit the hierarchy of the given Galois field arithmetic circuits. Our experiments show that, using this approach, our tool can abstract and verify Galois field arithmetic circuits up to 1024 bits in size. Contemporary techniques fail to verify these types of circuits beyond 163 bits and cannot abstract a canonical representation beyond 32 bits

    Recent Advances in Embedded Computing, Intelligence and Applications

    Get PDF
    The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems

    A Practical Hardware Implementation of Systemic Computation

    Get PDF
    It is widely accepted that natural computation, such as brain computation, is far superior to typical computational approaches addressing tasks such as learning and parallel processing. As conventional silicon-based technologies are about to reach their physical limits, researchers have drawn inspiration from nature to found new computational paradigms. Such a newly-conceived paradigm is Systemic Computation (SC). SC is a bio-inspired model of computation. It incorporates natural characteristics and defines a massively parallel non-von Neumann computer architecture that can model natural systems efficiently. This thesis investigates the viability and utility of a Systemic Computation hardware implementation, since prior software-based approaches have proved inadequate in terms of performance and flexibility. This is achieved by addressing three main research challenges regarding the level of support for the natural properties of SC, the design of its implied architecture and methods to make the implementation practical and efficient. Various hardware-based approaches to Natural Computation are reviewed and their compatibility and suitability, with respect to the SC paradigm, is investigated. FPGAs are identified as the most appropriate implementation platform through critical evaluation and the first prototype Hardware Architecture of Systemic computation (HAoS) is presented. HAoS is a novel custom digital design, which takes advantage of the inbuilt parallelism of an FPGA and the highly efficient matching capability of a Ternary Content Addressable Memory. It provides basic processing capabilities in order to minimize time-demanding data transfers, while the optional use of a CPU provides high-level processing support. It is optimized and extended to a practical hardware platform accompanied by a software framework to provide an efficient SC programming solution. The suggested platform is evaluated using three bio-inspired models and analysis shows that it satisfies the research challenges and provides an effective solution in terms of efficiency versus flexibility trade-off

    Low Computational Complexity Network Coding For Mobile Networks

    Get PDF

    Doctor of Philosophy

    Get PDF
    dissertationIn the static analysis of functional programs, control- ow analysis (k-CFA) is a classic method of approximating program behavior as a infinite state automata. CFA2 and abstract garbage collection are two recent, yet orthogonal improvements, on k-CFA. CFA2 approximates program behavior as a pushdown system, using summarization for the stack. CFA2 can accurately approximate arbitrarily-deep recursive function calls, whereas k-CFA cannot. Abstract garbage collection removes unreachable values from the store/heap. If unreachable values are not removed from a static analysis, they can become reachable again, which pollutes the final analysis and makes it less precise. Unfortunately, as these two techniques were originally formulated, they are incompatible. CFA2's summarization technique for managing the stack obscures the stack such that abstract garbage collection is unable to examine the stack for reachable values. This dissertation presents introspective pushdown control-flow analysis, which manages the stack explicitly through stack changes (pushes and pops). Because this analysis is able to examine the stack by how it has changed, abstract garbage collection is able to examine the stack for reachable values. Thus, introspective pushdown control-flow analysis merges successfully the benefits of CFA2 and abstract garbage collection to create a more precise static analysis. Additionally, the high-performance computing community has viewed functional programming techniques and tools as lacking the efficiency necessary for their applications. Nebo is a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena. For efficient execution, Nebo exploits a version of expression templates, based on the C++ template system, which is a type-less, completely-pure, Turing-complete functional language with burdensome syntax. Nebo's declarative syntax supports functional tools, such as point-wise lifting of complex expressions and functional composition of stencil operators. Nebo's primary abstraction is mathematical assignment, which separates what a calculation does from how that calculation is executed. Currently Nebo supports single-core execution, multicore (thread-based) parallel execution, and GPU execution. With single-core execution, Nebo performs on par with the loops and code that it replaces in Wasatch, a pre-existing high-performance simulation project. With multicore (thread-based) execution, Nebo can linearly scale (with roughly 90% efficiency) up to 6 processors, compared to its single-core execution. Moreover, Nebo's GPU execution can be up to 37x faster than its single-core execution. Finally, Wasatch (the pre-existing high-performance simulation project which uses Nebo) can scale up to 262K cores

    Doctor of Philosophy

    Get PDF
    dissertationWith the spread of internet and mobile devices, transferring information safely and securely has become more important than ever. Finite fields have widespread applications in such domains, such as in cryptography, error correction codes, among many others. In most finite field applications, the field size - and therefore the bit-width of the operands - can be very large. The high complexity of arithmetic operations over such large fields requires circuits to be (semi-) custom designed. This raises the potential for errors/bugs in the implementation, which can be maliciously exploited and can compromise the security of such systems. Formal verification of finite field arithmetic circuits has therefore become an imperative. This dissertation targets the problem of formal verification of hardware implementations of combinational arithmetic circuits over finite fields of the type F2k . Two specific problems are addressed: i) verifying the correctness of a custom-designed arithmetic circuit implementation against a given word-level polynomial specification over F2k ; and ii) gate-level equivalence checking of two different arithmetic circuit implementations. This dissertation proposes polynomial abstractions over finite fields to model and represent the circuit constraints. Subsequently, decision procedures based on modern computer algebra techniques - notably, Gr¨obner bases-related theory and technology - are engineered to solve the verification problem efficiently. The arithmetic circuit is modeled as a polynomial system in the ring F2k [x1, x2, · · · , xd], and computer algebrabased results (Hilbert's Nullstellensatz) over finite fields are exploited for verification. Using our approach, experiments are performed on a variety of custom-designed finite field arithmetic benchmark circuits. The results are also compared against contemporary methods, based on SAT and SMT solvers, BDDs, and AIG-based methods. Our tools can verify the correctness of, and detect bugs in, up to 163-bit circuits in F2163 , whereas contemporary approaches are infeasible beyond 48-bit circuits

    Embedded electronic systems driven by run-time reconfigurable hardware

    Get PDF
    Abstract This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology –available through SRAM-based FPGA/SoC devices– aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation –silicon area, processing time, complexity, flexibility, functional density, cost and power consumption– in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen Esta tesis doctoral abarca el diseño de sistemas electrónicos embebidos basados en tecnología hardware dinámicamente reconfigurable –disponible a través de dispositivos lógicos programables SRAM FPGA/SoC– que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguración que proporcione a la FPGA la capacidad de reconfiguración dinámica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicación particionada en tareas multiplexadas en tiempo y en espacio, optimizando así su implementación física –área de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipada– comparada con otras alternativas basadas en hardware estático (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalúa el flujo de diseño de dicha tecnología a través del prototipado de varias aplicaciones de ingeniería (sistemas de control, coprocesadores aritméticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotación en la industria.Resum Aquesta tesi doctoral està orientada al disseny de sistemes electrònics empotrats basats en tecnologia hardware dinàmicament reconfigurable –disponible mitjançant dispositius lògics programables SRAM FPGA/SoC– que contribueixin a la millora de la qualitat de vida de la societat. S’investiga l’arquitectura del sistema i del motor de reconfiguració que proporcioni a la FPGA la capacitat de reconfiguració dinàmica parcial dels seus recursos programables, amb l’objectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicació particionada en tasques multiplexades en temps i en espai, optimizant així la seva implementació física –àrea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potència dissipada– comparada amb altres alternatives basades en hardware estàtic (MCU, DSP, GPU, ASSP, ASIC, etc.). S’evalúa el fluxe de disseny d’aquesta tecnologia a través del prototipat de varies aplicacions d’enginyeria (sistemes de control, coprocessadors aritmètics, processadors d’imatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotació a la indústria

    Abstraction Raising in General-Purpose Compilers

    Get PDF

    A Survey on Data Plane Programming with P4: Fundamentals, Advances, and Applied Research

    Full text link
    With traditional networking, users can configure control plane protocols to match the specific network configuration, but without the ability to fundamentally change the underlying algorithms. With SDN, the users may provide their own control plane, that can control network devices through their data plane APIs. Programmable data planes allow users to define their own data plane algorithms for network devices including appropriate data plane APIs which may be leveraged by user-defined SDN control. Thus, programmable data planes and SDN offer great flexibility for network customization, be it for specialized, commercial appliances, e.g., in 5G or data center networks, or for rapid prototyping in industrial and academic research. Programming protocol-independent packet processors (P4) has emerged as the currently most widespread abstraction, programming language, and concept for data plane programming. It is developed and standardized by an open community and it is supported by various software and hardware platforms. In this paper, we survey the literature from 2015 to 2020 on data plane programming with P4. Our survey covers 497 references of which 367 are scientific publications. We organize our work into two parts. In the first part, we give an overview of data plane programming models, the programming language, architectures, compilers, targets, and data plane APIs. We also consider research efforts to advance P4 technology. In the second part, we analyze a large body of literature considering P4-based applied research. We categorize 241 research papers into different application domains, summarize their contributions, and extract prototypes, target platforms, and source code availability.Comment: Submitted to IEEE Communications Surveys and Tutorials (COMS) on 2021-01-2
    • …
    corecore