224 research outputs found

    Near-threshold fatigue crack growth in bulk metallic glass composites

    Get PDF
    A major drawback in using bulk metallic glasses (BMGs) as structural materials is their extremely poor fatigue performance. One way to alleviate this problem is through the composite route, in which second phases are introduced into the glass to arrest crack growth. In this paper, the fatigue crack growth behavior of in situ reinforced BMGs with crystalline dendrites, which are tailored to impart significant ductility and toughness to the BMG, was investigated. Three composites, all with equal volume fraction of dendrite phases, were examined to assess the influence of chemical composition on the near-threshold fatigue crack growth characteristics. While the ductility is enhanced at the cost of yield strength vis-à-vis that of the fully amorphous BMG, the threshold stress intensity factor range for fatigue crack initiation in composites was found to be enhanced by more than 100%. Crack blunting and trapping by the dendritic phases and constraining of the shear bands within the interdendritic regions are the micromechanisms responsible for this enhanced fatigue crack growth resistance

    Texturizing PPCG: Supporting Texture Memory in a Polyhedral Compiler

    Get PDF
    In this paper, we discuss techniques to transform sequential programs to texture/surface memory optimized CUDA programs. We achieve this by using PPCG, an automatic paral- lelizing compiler based on the Polyhedral model. We implemented a static analysis in PPCG which validates the semantics of the texturized transformed program. Depending on the results of the analysis, our algorithm chooses to use texture and/or surface memory, and alters the Abstract Syntax Tree accordingly. We also modified the code-generation phase of PPCG to take care of various subtleties. We evaluated the texturization algorithm on the PolyBench (4.2.1 beta) benchmark and observed up to 1.6x speedup with a geometric mean of 1.103X. The title and at many places, the paper uses term Texture memory. But, the optimizations are for Texture and Surface memory

    Vectorization, Obfuscation and P4 LLVM Tool-chain

    Get PDF
    This thesis broadly focuses on three different areas: Loop Vectorization, Code Obfuscation, and P4LLVM compiler. The work in Loop vectorization starts with a comparison of Auto-vectorization of GCC, ICC and LLVM compilers and show their strengths and weakness. As an attempt to improve LLVM’s Auto-vectorization, we propose to improve Loop Distribution using exact dependences from Polly. Our work on Loop Distribution shows promising results. We developed an LLVM based Code Obfuscation engine with various obfuscation techniques as transformation passes, our techniques are novel and are different from existing works [1]. In hardware circuit obfuscation several methods were proposed at the hardware level to secure the IP. Our approach is to obfuscate the circuits at the software level, using code obfuscation techniques

    Optimization and parallelization of tensor and ODE/PDE computations on GPU

    Get PDF
    We propose a multi-level GPU-based parallelization algorithm to solve the multi-compartment Hodgkin Huxley (HH) model equation that requires solving the Hines matrix. We use a ‘parallel-in-time’ algorithm (like the Parareal strategy) for obtaining outer level parallelism, and an Exact Domain Decomposition (EDD) algorithm with fine-decomposition for inner-level parallelism. We show that our technique can also be applied to any differential equation like the heat equations which induce tridiagonal systems. Typically, a solution to the HH equation runs for hundreds to tens of thousands of time-steps while solving a Hines matrix at each time step. Previous solutions by Michael Mascagni et al. (1991) and Hines et al. (2008) to this problem have tackled only solving the Hines matrix in parallel. Our approach uses the dynamic parallelism of CUDA to achieve multi-level parallelism on GPUs. Our solution outperforms the sequential time method on standard neuron morphologies upto 2.5x. We also show that iterative part of parareal method converges in 5-7 iterations on average with an accuracy of 10−6. We also propose a GPU optimization for the Higher Order Tensor Renormalization Group problem, where the tensor contraction operations inside HOTRG is optimized by a multi- GPU implementation using cuBLAS xt API

    LLOV: A Fast Static Data-Race Checker for OpenMP Programs

    Get PDF
    In the era of Exascale computing, writing efficient parallel programs is indispensable and at the same time, writing sound parallel programs is highly difficult. While parallel programming is easier with frameworks such as OpenMP, the possibility of data races in these programs still persists. In this paper, we propose a fast, lightweight, language agnostic, and static data race checker for OpenMP programs based on the LLVM compiler framework. We compare our tool with other state-of-the-art data race checkers on a variety of well-established benchmarks. We show that the precision, accuracy, and the F1 score of our tool is comparable to other checkers while being orders of magnitude faster. To the best of our knowledge, this work is the only tool among the state-of-the-art data race checkers that can verify a FORTRAN program to be data race free

    RL4ReAl: Reinforcement Learning for Register Allocation

    Full text link
    We propose a novel solution for the Register Allocation problem, leveraging multi-agent hierarchical Reinforcement Learning. We formalize the constraints that precisely define the problem for a given instruction-set architecture, while ensuring that the generated code preserves semantic correctness. We also develop a gRPC based framework providing a modular and efficient compiler interface for training and inference. Experimental results match or outperform the LLVM register allocators, targeting Intel x86 and ARM AArch64

    Optimizations In Compiler: Vectorization, Reordering, Register Allocation And Verification Of Explicitly Parallel Programs

    Get PDF
    Compiler Optimizations form a very important part of compiler development as they make a major difeerence between an average and a great compiler. There are various modules of a compiler-which opens opportunities for optimizations on various spheres. In this thesis, a comparative study of vectorization is done exposing the strengths and weaknesses of various contemporary compilers. Additionally, a study on the impact of vectorization on tiled code is performed. Different strategies for loop nest optimization is explored. An algorithm for statement reordering in loops to enhance performance has been developed. An Integer Linear Program formulation is done to improve loop parallelism, which makes use of loop unrolling and explicitly parallel directives. Finally, an attempt for optimal loop distribution is made. Following loop nest optimization chapter, an explanation of interprocedural register allocation(IPRA) for ARM32 and AArch64 is given. Additionally, a brief description of the problems for implementing IPRA for those architectures is presented. We conclude the chapter with the performance results with IPRA for those platforms. In the last chapter, a description of VoPiL, a static OpenMP verifier in LLVM, is presented. A brief description of the analysis and the results are included

    Polyhedral Compilation: Applications, Approximations and GPU-specific Optimizations

    Get PDF
    Polyhedral compilation has been successful in analyzing, optimizing, automatically parallelizing a�ne computations for modern heterogenous target architectures. Many of the tools have been developed to automate the process of program analysis and transformations for a�ne control parts of programs including widely used open-source and production compilers such as GCC, LLVM, IBM/XL. This thesis makes contribution to the polyhedral model in three orthogonal dimensions as follows: • Applications: Applies polyhedral loop transformations on Deep learning computation kernel to demonstrate the e�ectiveness of complex loop transformations on these kernels. • Approximations: Developes two efficient algorithms to over-approximate convex polyhedra into U-TVPI polyhedra having applications in polyhedral compilation as well as automated program verification. • GPU-Specific Optimizations: Builds end-to-end fully automatic compiler framework to generate cache optimized CUDA code begining from sequential C program by using polyhedral modelling techniques.

    The Effect of Dietary Supplementation with Spent Cider Yeast on the Swine Distal Gut Microbiome

    Get PDF
    peer-reviewedBackground: There is an increasing need for alternatives to antibiotics for promoting animal health, given the increasing problems associated with antibiotic resistance. In this regard, we evaluated spent cider yeast as a potential probiotic for modifying the gut microbiota in weanling pigs using pyrosequencing of 16S rRNA gene libraries. Methodology and Principal Findings: Piglets aged 24–26 days were assigned to one of two study groups; control (n = 12) and treatment (n = 12). The control animals were fed with a basal diet and the treatment animals were fed with basal diet in combination with cider yeast supplement (500 ml cider yeast containing ,7.6 log CFU/ml) for 21 days. Faecal samples were collected for 16s rRNA gene compositional analysis. 16S rRNA compositional sequencing analysis of the faecal samples collected from day 0 and day 21 revealed marked differences in microbial diversity at both the phylum and genus levels between the control and treatment groups. This analysis confirmed that levels of Salmonella and Escherichia were significantly decreased in the treatment group, compared with the control (P,0.001). This data suggest a positive influence of dietary supplementation with live cider yeast on the microbial diversity of the pig distal gut. Conclusions/Significance: The effect of dietary cider yeast on porcine gut microbial communities was characterized for the first time using 16S rRNA gene compositional sequencing. Dietary cider yeast can potentially alter the gut microbiota, however such changes depend on their endogenous microbiota that causes a divergence in relative response to that given diet.This work was funded by Enterprise Ireland, under the Commercialisation Fund (Contract No: CFTD/05/117), the Irish Government under the National Development Plan, 2000–2006, the European Research and Development Fund and Science Foundation Ireland (SFI).European Research and Development Fun
    corecore