4,581 research outputs found

    A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Applications

    Get PDF
    Heterogeneous High-Performance Computing (HPC) platforms present a significant programming challenge, especially because the key users of HPC resources are scientists, not parallel programmers. We contend that compiler technology has to evolve to automatically create the best program variant by transforming a given original program. We have developed a novel methodology based on type transformations for generating correct-by-construction design variants, and an associated light-weight cost model for evaluating these variants for implementation on FPGAs. In this paper we present a key enabler of our approach, the cost model. We discuss how we are able to quickly derive accurate estimates of performance and resource-utilization from the design’s representation in our intermediate language. We show results confirming the accuracy of our cost model by testing it on three different scientific kernels. We conclude with a case-study that compares a solution generated by our framework with one from a conventional high-level synthesis tool, showing better performance and power-efficiency using our cost model based approach

    Optimizing compilation with preservation of structural code coverage metrics to support software testing

    Get PDF
    Code-coverage-based testing is a widely-used testing strategy with the aim of providing a meaningful decision criterion for the adequacy of a test suite. Code-coverage-based testing is also mandated for the development of safety-critical applications; for example, the DO178b document requires the application of the modified condition/decision coverage. One critical issue of code-coverage testing is that structural code coverage criteria are typically applied to source code whereas the generated machine code may result in a different code structure because of code optimizations performed by a compiler. In this work, we present the automatic calculation of coverage profiles describing which structural code-coverage criteria are preserved by which code optimization, independently of the concrete test suite. These coverage profiles allow to easily extend compilers with the feature of preserving any given code-coverage criteria by enabling only those code optimizations that preserve it. Furthermore, we describe the integration of these coverage profile into the compiler GCC. With these coverage profiles, we answer the question of how much code optimization is possible without compromising the error-detection likelihood of a given test suite. Experimental results conclude that the performance cost to achieve preservation of structural code coverage in GCC is rather low.Peer reviewedSubmitted Versio

    Vector processing-aware advanced clock-gating techniques for low-power fused multiply-add

    Get PDF
    The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market that they are entering now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high power consumption, deserves special attention. Although clock gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques ensure power savings without jeopardizing the timing. We evaluate the proposed techniques using both synthetic and “real-world” application-based benchmarking. Using vector masking and vector multilane-aware clock gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector FP instructions. Finally, when evaluating all techniques together, using “real-world” benchmarking, the power reductions are up to 80%. Additionally, in accordance with processor design trends, we perform this research in a fully parameterizable and automated fashion.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA 321253 and is supported in part by the European Union (FEDER funds) under contract TTIN2015-65316-P. The work of I. Ratkovic was supported by a FPU research grant from the Spanish MECD.Peer ReviewedPostprint (author's final draft

    FlashProfile: A Framework for Synthesizing Data Profiles

    Get PDF
    We address the problem of learning a syntactic profile for a collection of strings, i.e. a set of regex-like patterns that succinctly describe the syntactic variations in the strings. Real-world datasets, typically curated from multiple sources, often contain data in various syntactic formats. Thus, any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify the different formats is infeasible in standard big-data scenarios. Prior techniques are restricted to a small set of pre-defined patterns (e.g. digits, letters, words, etc.), and provide no control over granularity of profiles. We define syntactic profiling as a problem of clustering strings based on syntactic similarity, followed by identifying patterns that succinctly describe each cluster. We present a technique for synthesizing such profiles over a given language of patterns, that also allows for interactive refinement by requesting a desired number of clusters. Using a state-of-the-art inductive synthesis framework, PROSE, we have implemented our technique as FlashProfile. Across 153153 tasks over 7575 large real datasets, we observe a median profiling time of only ∌ 0.7 \sim\,0.7\,s. Furthermore, we show that access to syntactic profiles may allow for more accurate synthesis of programs, i.e. using fewer examples, in programming-by-example (PBE) workflows such as FlashFill.Comment: 28 pages, SPLASH (OOPSLA) 201

    Superlattice Nanowire Pattern Transfer (SNAP)

    Get PDF
    During the past 15 years or so, nanowires (NWs) have emerged as a new and distinct class of materials. Their novel structural and physical properties separate them from wires that can be prepared using the standard methods for manufacturing electronics. NW-based applications that range from traditional electronic devices (logic and memory) to novel biomolecular and chemical sensors, thermoelectric materials, and optoelectronic devices, all have appeared during the past few years. From a fundamental perspective, NWs provide a route toward the investigation of new physics in confined dimensions. Perhaps the most familiar fabrication method is the vapor−liquid−solid (VLS) growth technique, which produces semiconductor nanowires as bulk materials. However, other fabrication methods exist and have their own advantages. In this Account, I review a particular class of NWs produced by an alternative method called superlattice nanowire pattern transfer (SNAP). The SNAP method is distinct from other nanowire preparation methods in several ways. It can produce large NW arrays from virtually any thin-film material, including metals, insulators, and semiconductors. The dimensions of the NWs can be controlled with near-atomic precision, and NW widths and spacings can be as small as a few nanometers. In addition, SNAP is almost fully compatible with more traditional methods for manufacturing electronics. The motivation behind the development of SNAP was to have a general nanofabrication method for preparing electronics-grade circuitry, but one that would operate at macromolecular dimensions and with access to a broad materials set. Thus, electronics applications, including novel demultiplexing architectures; large-scale, ultrahigh-density memory circuits; and complementary symmetry nanowire logic circuits, have served as drivers for developing various aspects of the SNAP method. Some of that work is reviewed here. As the SNAP method has evolved into a robust nanofabrication method, it has become an enabling tool for the investigation of new physics. In particular, the application of SNAP toward understanding heat transport in low-dimensional systems is discussed. This work has led to the surprising discovery that Si NWs can serve as highly efficient thermoelectric materials. Finally, we turn toward the application of SNAP to the investigation of quasi-one-dimensional (quasi-1D) superconducting physics in extremely high aspect ratio Nb NWs
    • 

    corecore