251 research outputs found

    Verified compilation and optimization of floating-point kernels

    Get PDF
    When verifying safety-critical code on the level of source code, we trust the compiler to produce machine code that preserves the behavior of the source code. Trusting a verified compiler is easy. A rigorous machine-checked proof shows that the compiler correctly translates source code into machine code. Modern verified compilers (e.g. CompCert and CakeML) have rich input languages, but only rudimentary support for floating-point arithmetic. In fact, state-of-the-art verified compilers only implement and verify an inflexible one-to-one translation from floating-point source code to machine code. This translation completely ignores that floating-point arithmetic is actually a discrete representation of the continuous real numbers. This thesis presents two extensions improving floating-point arithmetic in CakeML. First, the thesis demonstrates verified compilation of elementary functions to floating-point code in: Dandelion, an automatic verifier for polynomial approximations of elementary functions; and libmGen, a proof-producing compiler relating floating-point machine code to the implemented real-numbered elementary function. Second, the thesis demonstrates verified optimization of floating-point code in: Icing, a floating-point language extending standard floating-point arithmetic with optimizations similar to those used by unverified compilers, like GCC and LLVM; and RealCake, an extension of CakeML with Icing into the first fully verified optimizing compiler for floating-point arithmetic.Bei der Verifizierung von sicherheitsrelevantem Quellcode vertrauen wir dem Compiler, dass er Maschinencode ausgibt, der sich wie der Quellcode verhĂ€lt. Man kann ohne weiteres einem verifizierten Compiler vertrauen. Ein rigoroser maschinen-ĂŒ}berprĂŒfter Beweis zeigt, dass der Compiler Quellcode in korrekten Maschinencode ĂŒbersetzt. Moderne verifizierte Compiler (z.B. CompCert und CakeML) haben komplizierte Eingabesprachen, aber unterstĂŒtzen Gleitkommaarithmetik nur rudimentĂ€r. De facto implementieren und verifizieren hochmoderne verifizierte Compiler fĂŒr Gleitkommaarithmetik nur eine starre eins-zu-eins Übersetzung von Quell- zu Maschinencode. Diese Übersetzung ignoriert vollstĂ€ndig, dass Gleitkommaarithmetik eigentlich eine diskrete ReprĂ€sentation der kontinuierlichen reellen Zahlen ist. Diese Dissertation prĂ€sentiert zwei Erweiterungen die Gleitkommaarithmetik in CakeML verbessern. Zuerst demonstriert die Dissertation verifizierte Übersetzung von elementaren Funktionen in Gleitkommacode mit: Dandelion, einem automatischen Verifizierer fĂŒr Polynomapproximierungen von elementaren Funktionen; und libmGen, einen Beweis-erzeugenden Compiler der Gleitkommacode in Relation mit der implementierten elementaren Funktion setzt. Dann demonstriert die Dissertation verifizierte Optimierung von Gleitkommacode mit: Icing, einer Gleitkommasprache die Gleitkommaarithmetik mit Optimierungen erweitert die Ă€hnlich zu denen in unverifizierten Compilern, wie GCC und LLVM, sind; und RealCake, eine Erweiterung von CakeML mit Icing als der erste vollverifizierte Compiler fĂŒr Gleitkommaarithmetik

    High-level synthesis for FPGAs: From prototyping to deployment

    Get PDF
    Abstract-Escalating System-on-Chip design complexity is pushing the design community to raise the level of abstraction beyond RTL. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS methodology is happening now, especially for FPGA designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper we use AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. Index Terms-Domain-specific design, field-programmable gate array (FPGA), high-level synthesis (HLS), quality of results (QoR)

    Domain specific high performance reconfigurable architecture for a communication platform

    Get PDF

    End-to-end security in service-oriented architecture

    Get PDF
    A service-oriented architecture (SOA)-based application is composed of a number of distributed and loosely-coupled web services, which are orchestrated to accomplish a more complex functionality. Any of these web services is able to invoke other web services to offload part of its functionality. The main security challenge in SOA is that we cannot trust the participating web services in a service composition to behave as expected all the time. In addition, the chain of services involved in an end-to-end service invocation may not be visible to the clients. As a result, any violation of client’s policies could remain undetected. To address these challenges in SOA, we proposed the following contributions. First, we devised two composite trust schemes by using graph abstraction to quantitatively maintain the trust levels of different services. The composite trust values are based on feedbacks from the actual execution of services, and the structure of the SOA application. To maintain the dynamic trust, we designed the trust manager, which is a trusted-third party service. Second, we developed an end-to-end inter-service policy monitoring and enforcement framework (PME framework), which is able to dynamically inspect the interactions between services at runtime and react to the potentially malicious activities according to the client’s policies. Third, we designed an intra-service policy monitoring and enforcement framework based on taint analysis mechanism to monitor the information flow within services and prevent information disclosure incidents. Fourth, we proposed an adaptive and secure service composition engine (ASSC), which takes advantage of an efficient heuristic algorithm to generate optimal service compositions in SOA. The service compositions generated by ASSC maximize the trustworthiness of the selected services while meeting the predefined QoS constraints. Finally, we have extensively studied the correctness and performance of the proposed security measures based on a realistic SOA case study. All experimental studies validated the practicality and effectiveness of the presented solutions

    Highly-configurable FPGA-based platform for wireless network research

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 155-164).Over the past few years, researchers have developed many cross-layer wireless protocols to improve the performance of wireless networks. Experimental evaluations of these protocols require both high-speed simulations and real-time on-air experimentations. Unfortunately, radios implemented in pure software are usually inadequate for either because they are typically two to three orders of magnitude slower than commodity hardware. FPGA-based platforms provide much better speeds but are quite difficult to modify because of the way high-speed designs are typically implemented by trading modularity for performance. Experimenting with cross-layer protocols requires a flexible way to convey information beyond the data itself from lower to higher layers, and a way for higher layers to configure lower layers dynamically and within some latency bounds. One also needs to be able to modify a layer's processing pipeline without triggering a cascade of changes. In this thesis, we discuss an alternative approach to implement a high-performance yet configurable radio design on an FPGA platform that satisfies these requirements. We propose that all modules in the design must possess two important design properties, namely latency-insensitivity and datadriven control, which facilitate modular refinements. We have developed Airblue, an FPGA-based radio, that has all these properties and runs at speeds comparable to commodity hardware. Our baseline design is 802.11g compliant and is able to achieve reliable communication for bit rates up to 24 Mbps. We show in the thesis that we can implement SoftRate, a cross-layer rate adaptation protocol, by modifying only 5.6% of the source code (967 lines). We also show that our modular design approach allows us to abstract the details of the FPGA platform from the main design, thus making the design portable across multiple FPGA platforms. By taking advantage of this virtualization capability, we were able to turn Airblue into a high-speed hardware software co-simulator with simulation speed beyond 20 Mbps.by Man Cheuk Ng.Ph.D

    A Memory-Centric Customizable Domain-Specific FPGA Overlay for Accelerating Machine Learning Applications

    Get PDF
    Low latency inferencing is of paramount importance to a wide range of real time and userfacing Machine Learning (ML) applications. Field Programmable Gate Arrays (FPGAs) offer unique advantages in delivering low latency as well as energy efficient accelertors for low latency inferencing. Unfortunately, creating machine learning accelerators in FPGAs is not easy, requiring the use of vendor specific CAD tools and low level digital and hardware microarchitecture design knowledge that the majority of ML researchers do not possess. The continued refinement of High Level Synthesis (HLS) tools can reduce but not eliminate the need for hardware-specific design knowledge. The designs by these tools can also produce inefficient use of FPGA resources that ultimately limit the performance of the neural network. This research investigated a new FPGA-based software-hardware codesigned overlay architecture that opens the advantages of FPGAs to the broader ML user community. As an overlay, the proposed design allows rapid coding and deployment of different ML network configurations and different data-widths, eliminating the prior barrier of needing to resynthesize each design. This brings important attributes of code portability over different FPGA families. The proposed overlay design is a Single-Instruction-Multiple-Data (SIMD) Processor-In-Memory (PIM) architecture developed as a programmable overlay for FPGAs. In contrast to point designs, it can be programmed to implement different types of machine learning algorithms. The overlay architecture integrates bit-serial Arithmetic Logic Units (ALUs) with distributed Block RAMs (BRAMs). The PIM design increases the size of arithmetic operations and on-chip storage capacity. User-visible inference latencies are reduced by exploiting concurrent accesses to network parameters (weights and biases) and partial results stored throughout the distributed BRAMs. Run-time performance comparisons show that the proposed design achieves a speedup compared to HLS-based or custom-tuned equivalent designs. Notably, the proposed design is programmable, allowing rapid design space exploration without the need to resynthesize when changing ML algorithms on the FPGA

    Roadmap on Electronic Structure Codes in the Exascale Era

    Get PDF
    Electronic structure calculations have been instrumental in providing many important insights into a range of physical and chemical properties of various molecular and solid-state systems. Their importance to various fields, including materials science, chemical sciences, computational chemistry and device physics, is underscored by the large fraction of available public supercomputing resources devoted to these calculations. As we enter the exascale era, exciting new opportunities to increase simulation numbers, sizes, and accuracies present themselves. In order to realize these promises, the community of electronic structure software developers will however first have to tackle a number of challenges pertaining to the efficient use of new architectures that will rely heavily on massive parallelism and hardware accelerators. This roadmap provides a broad overview of the state-of-the-art in electronic structure calculations and of the various new directions being pursued by the community. It covers 14 electronic structure codes, presenting their current status, their development priorities over the next five years, and their plans towards tackling the challenges and leveraging the opportunities presented by the advent of exascale computing.Comment: Submitted as a roadmap article to Modelling and Simulation in Materials Science and Engineering; Address any correspondence to Vikram Gavini ([email protected]) and Danny Perez ([email protected]

    Roadmap on Electronic Structure Codes in the Exascale Era

    Get PDF
    Electronic structure calculations have been instrumental in providing many important insights into a range of physical and chemical properties of various molecular and solid-state systems. Their importance to various fields, including materials science, chemical sciences, computational chemistry and device physics, is underscored by the large fraction of available public supercomputing resources devoted to these calculations. As we enter the exascale era, exciting new opportunities to increase simulation numbers, sizes, and accuracies present themselves. In order to realize these promises, the community of electronic structure software developers will however first have to tackle a number of challenges pertaining to the efficient use of new architectures that will rely heavily on massive parallelism and hardware accelerators. This roadmap provides a broad overview of the state-of-the-art in electronic structure calculations and of the various new directions being pursued by the community. It covers 14 electronic structure codes, presenting their current status, their development priorities over the next five years, and their plans towards tackling the challenges and leveraging the opportunities presented by the advent of exascale computing

    Dynamically reconfigurable bio-inspired hardware

    Get PDF
    During the last several years, reconfigurable computing devices have experienced an impressive development in their resource availability, speed, and configurability. Currently, commercial FPGAs offer the possibility of self-reconfiguring by partially modifying their configuration bitstream, providing high architectural flexibility, while guaranteeing high performance. These configurability features have received special interest from computer architects: one can find several reconfigurable coprocessor architectures for cryptographic algorithms, image processing, automotive applications, and different general purpose functions. On the other hand we have bio-inspired hardware, a large research field taking inspiration from living beings in order to design hardware systems, which includes diverse topics: evolvable hardware, neural hardware, cellular automata, and fuzzy hardware, among others. Living beings are well known for their high adaptability to environmental changes, featuring very flexible adaptations at several levels. Bio-inspired hardware systems require such flexibility to be provided by the hardware platform on which the system is implemented. In general, bio-inspired hardware has been implemented on both custom and commercial hardware platforms. These custom platforms are specifically designed for supporting bio-inspired hardware systems, typically featuring special cellular architectures and enhanced reconfigurability capabilities; an example is their partial and dynamic reconfigurability. These aspects are very well appreciated for providing the performance and the high architectural flexibility required by bio-inspired systems. However, the availability and the very high costs of such custom devices make them only accessible to a very few research groups. Even though some commercial FPGAs provide enhanced reconfigurability features such as partial and dynamic reconfiguration, their utilization is still in its early stages and they are not well supported by FPGA vendors, thus making their use difficult to include in existing bio-inspired systems. In this thesis, I present a set of architectures, techniques, and methodologies for benefiting from the configurability advantages of current commercial FPGAs in the design of bio-inspired hardware systems. Among the presented architectures there are neural networks, spiking neuron models, fuzzy systems, cellular automata and random boolean networks. For these architectures, I propose several adaptation techniques for parametric and topological adaptation, such as hebbian learning, evolutionary and co-evolutionary algorithms, and particle swarm optimization. Finally, as case study I consider the implementation of bio-inspired hardware systems in two platforms: YaMoR (Yet another Modular Robot) and ROPES (Reconfigurable Object for Pervasive Systems); the development of both platforms having been co-supervised in the framework of this thesis
    • 

    corecore