224 research outputs found
Near-threshold fatigue crack growth in bulk metallic glass composites
A major drawback in using bulk metallic glasses (BMGs) as structural materials is their extremely poor fatigue performance. One way to alleviate this problem is through the composite route, in which second phases are introduced into the glass to arrest crack growth. In this paper, the fatigue crack growth behavior of in situ reinforced BMGs with crystalline dendrites, which are tailored to impart significant ductility and toughness to the BMG, was investigated. Three composites, all with equal volume fraction of dendrite phases, were examined to assess the influence of chemical composition on the near-threshold fatigue crack growth characteristics. While the ductility is enhanced at the cost of yield strength vis-à-vis that of the fully amorphous BMG, the threshold stress intensity factor range for fatigue crack initiation in composites was found to be enhanced by more than 100%. Crack blunting and trapping by the dendritic phases and constraining of the shear bands within the interdendritic regions are the micromechanisms responsible for this enhanced fatigue crack growth resistance
Texturizing PPCG: Supporting Texture Memory in a Polyhedral Compiler
In this paper, we discuss techniques to transform
sequential programs to texture/surface memory optimized CUDA
programs. We achieve this by using PPCG, an automatic paral-
lelizing compiler based on the Polyhedral model. We implemented
a static analysis in PPCG which validates the semantics of the
texturized transformed program. Depending on the results of
the analysis, our algorithm chooses to use texture and/or surface
memory, and alters the Abstract Syntax Tree accordingly. We
also modified the code-generation phase of PPCG to take care
of various subtleties. We evaluated the texturization algorithm
on the PolyBench (4.2.1 beta) benchmark and observed up to
1.6x speedup with a geometric mean of 1.103X. The title and
at many places, the paper uses term Texture memory. But, the
optimizations are for Texture and Surface memory
Vectorization, Obfuscation and P4 LLVM Tool-chain
This thesis broadly focuses on three different areas: Loop Vectorization, Code Obfuscation,
and P4LLVM compiler. The work in Loop vectorization starts with a
comparison of Auto-vectorization of GCC, ICC and LLVM compilers and show their
strengths and weakness. As an attempt to improve LLVM’s Auto-vectorization, we
propose to improve Loop Distribution using exact dependences from Polly. Our work
on Loop Distribution shows promising results. We developed an LLVM based Code
Obfuscation engine with various obfuscation techniques as transformation passes, our
techniques are novel and are different from existing works [1]. In hardware circuit
obfuscation several methods were proposed at the hardware level to secure the IP.
Our approach is to obfuscate the circuits at the software level, using code obfuscation
techniques
Optimization and parallelization of tensor and ODE/PDE computations on GPU
We propose a multi-level GPU-based parallelization algorithm to solve the multi-compartment
Hodgkin Huxley (HH) model equation that requires solving the Hines matrix. We use
a ‘parallel-in-time’ algorithm (like the Parareal strategy) for obtaining outer level parallelism,
and an Exact Domain Decomposition (EDD) algorithm with fine-decomposition for
inner-level parallelism. We show that our technique can also be applied to any differential
equation like the heat equations which induce tridiagonal systems.
Typically, a solution to the HH equation runs for hundreds to tens of thousands of time-steps
while solving a Hines matrix at each time step. Previous solutions by Michael Mascagni
et al. (1991) and Hines et al. (2008) to this problem have tackled only solving the Hines
matrix in parallel.
Our approach uses the dynamic parallelism of CUDA to achieve multi-level parallelism
on GPUs. Our solution outperforms the sequential time method on standard neuron morphologies
upto 2.5x. We also show that iterative part of parareal method converges in 5-7
iterations on average with an accuracy of 10−6.
We also propose a GPU optimization for the Higher Order Tensor Renormalization Group
problem, where the tensor contraction operations inside HOTRG is optimized by a multi-
GPU implementation using cuBLAS xt API
LLOV: A Fast Static Data-Race Checker for OpenMP Programs
In the era of Exascale computing, writing efficient parallel programs is indispensable and at the same time,
writing sound parallel programs is highly difficult. While parallel programming is easier with frameworks
such as OpenMP, the possibility of data races in these programs still persists. In this paper, we propose a
fast, lightweight, language agnostic, and static data race checker for OpenMP programs based on the LLVM
compiler framework. We compare our tool with other state-of-the-art data race checkers on a variety of
well-established benchmarks. We show that the precision, accuracy, and the F1 score of our tool is comparable
to other checkers while being orders of magnitude faster. To the best of our knowledge, this work is the only
tool among the state-of-the-art data race checkers that can verify a FORTRAN program to be data race free
RL4ReAl: Reinforcement Learning for Register Allocation
We propose a novel solution for the Register Allocation problem, leveraging
multi-agent hierarchical Reinforcement Learning. We formalize the constraints
that precisely define the problem for a given instruction-set architecture,
while ensuring that the generated code preserves semantic correctness. We also
develop a gRPC based framework providing a modular and efficient compiler
interface for training and inference. Experimental results match or outperform
the LLVM register allocators, targeting Intel x86 and ARM AArch64
Optimizations In Compiler: Vectorization, Reordering, Register Allocation And Verification Of Explicitly Parallel Programs
Compiler Optimizations form a very important part of compiler development as they make a major
difeerence between an average and a great compiler. There are various modules of a compiler-which
opens opportunities for optimizations on various spheres. In this thesis, a comparative study of
vectorization is done exposing the strengths and weaknesses of various contemporary compilers.
Additionally, a study on the impact of vectorization on tiled code is performed. Different strategies
for loop nest optimization is explored. An algorithm for statement reordering in loops to enhance
performance has been developed. An Integer Linear Program formulation is done to improve loop
parallelism, which makes use of loop unrolling and explicitly parallel directives. Finally, an attempt
for optimal loop distribution is made. Following loop nest optimization chapter, an explanation
of interprocedural register allocation(IPRA) for ARM32 and AArch64 is given. Additionally, a
brief description of the problems for implementing IPRA for those architectures is presented. We
conclude the chapter with the performance results with IPRA for those platforms. In the last
chapter, a description of VoPiL, a static OpenMP verifier in LLVM, is presented. A brief description
of the analysis and the results are included
Polyhedral Compilation: Applications, Approximations and GPU-specific Optimizations
Polyhedral compilation has been successful in analyzing, optimizing, automatically parallelizing
a�ne computations for modern heterogenous target architectures. Many of the tools have been
developed to automate the process of program analysis and transformations for a�ne control parts
of programs including widely used open-source and production compilers such as GCC, LLVM,
IBM/XL. This thesis makes contribution to the polyhedral model in three orthogonal dimensions as
follows:
• Applications: Applies polyhedral loop transformations on Deep learning computation kernel
to demonstrate the e�ectiveness of complex loop transformations on these kernels.
• Approximations: Developes two efficient algorithms to over-approximate convex polyhedra
into U-TVPI polyhedra having applications in polyhedral compilation as well as automated
program verification.
• GPU-Specific Optimizations: Builds end-to-end fully automatic compiler framework to
generate cache optimized CUDA code begining from sequential C program by using polyhedral
modelling techniques.
The Effect of Dietary Supplementation with Spent Cider Yeast on the Swine Distal Gut Microbiome
peer-reviewedBackground: There is an increasing need for alternatives to antibiotics for promoting animal health, given the increasing
problems associated with antibiotic resistance. In this regard, we evaluated spent cider yeast as a potential probiotic for
modifying the gut microbiota in weanling pigs using pyrosequencing of 16S rRNA gene libraries.
Methodology and Principal Findings: Piglets aged 24–26 days were assigned to one of two study groups; control (n = 12)
and treatment (n = 12). The control animals were fed with a basal diet and the treatment animals were fed with basal diet in
combination with cider yeast supplement (500 ml cider yeast containing ,7.6 log CFU/ml) for 21 days. Faecal samples were
collected for 16s rRNA gene compositional analysis. 16S rRNA compositional sequencing analysis of the faecal samples
collected from day 0 and day 21 revealed marked differences in microbial diversity at both the phylum and genus levels
between the control and treatment groups. This analysis confirmed that levels of Salmonella and Escherichia were
significantly decreased in the treatment group, compared with the control (P,0.001). This data suggest a positive influence
of dietary supplementation with live cider yeast on the microbial diversity of the pig distal gut.
Conclusions/Significance: The effect of dietary cider yeast on porcine gut microbial communities was characterized for the
first time using 16S rRNA gene compositional sequencing. Dietary cider yeast can potentially alter the gut microbiota,
however such changes depend on their endogenous microbiota that causes a divergence in relative response to that given
diet.This work was funded by Enterprise Ireland, under the Commercialisation Fund (Contract No: CFTD/05/117), the Irish Government under the National
Development Plan, 2000–2006, the European Research and Development Fund and Science Foundation Ireland (SFI).European Research and Development Fun
- …