28 research outputs found

    Automatic Parallelization of Affine Loops using Dependence and Cache analysis in a Binary Rewriter

    Get PDF
    Today, nearly all general-purpose computers are parallel, but nearly all software running on them is serial. Bridging this disconnect by manually rewriting source code in parallel is prohibitively expensive. Automatic parallelization technology is therefore an attractive alternative. We present a method to perform automatic parallelization in a binary rewriter. The input to the binary rewriter is the serial binary executable program and the output is a parallel binary executable. The advantages of parallelization in a binary rewriter versus a compiler include (i) compatibility with all compilers and languages; (ii) high economic feasibility from avoiding repeated compiler implementation; (iii) applicability to legacy binaries; and (iv) applicability to assembly-language programs. Adapting existing parallelizing compiler methods that work on source code to work on binary programs instead is a significant challenge. This is primarily because symbolic and array index information used in existing compiler parallelizers is not available in a binary. We show how to adapt existing parallelization methods to achieve equivalent parallelization from a binary without such information. We have also designed a affine cache reuse model that works inside a binary rewriter building on the parallelization techniques. It quantifies cache reuse in terms of the number of cache lines that will be required when a loop dimension is considered for the innermost position in a loop nest. This cache metric can be used to reason about affine code that results when affine code is transformed using affine transformations. Hence, it can be used to evaluate candidate transformation sequences to improve run-time directly from a binary. Results using our x86 binary rewriter called SecondWrite on a suite of dense- matrix regular programs from Polybench suite of benchmarks shows an geomean speedup of 6.81X from binary and 8.9X from source with 8 threads compared to the input serial binary on a x86 Xeon E5530 machine; and 8.31X from binary and 9.86X from source with 24 threads compared to the input serial binary on a x86 E7450 machine. Such regular loops are an important component of scientific and multi- media workloads, and are even present to a limited extent in otherwise non-regular programs. Further in this thesis we present a novel algorithm that enhances the past techniques significantly for loops with unknown loop bounds by guessing the loop bounds using only the memory expressions present in a loop. It then inserts run-time checks to see if these guesses were indeed correct and if correct executes the parallel version of the loop, else the serial version executes. These techniques are applied to the large affine benchmarks in SPEC2006 and OMP2001 and unlike previous methods the speedups from binary are as good as from source. We also present results on the number of loops parallelized directly from a binary with and without this algorithm. Among the 8 affine benchmarks among these suites, the best existing binary parallelization method achieves an geo-mean speedup of 1.33X, whereas our method achieves a speedup of 2.96X. This is close to the speedup from source code of 2.8X

    Retrofitting Security in COTS Software with Binary Rewriting

    Get PDF
    We present a practical tool for inserting security features against low-level software attacks into third-party, proprietary or otherwise binary-only software. We are motivated by the inability of software users to select and use low-overhead protection schemes when source code is unavailable to them, by the lack of information as to what (if any) security mechanisms software producers have used in their toolchains, and the high overhead and inaccuracy of solutions that treat software as a black box. Our approach is based on SecondWrite, an advanced binary rewriter that operates without need for debugging information or other assist. Using SecondWrite, we insert a variety of defenses into program binaries. Although the defenses are generally well known, they have not generally been used together because they are implemented by different (non-integrated) tools. We are also the first to demonstrate the use of such mechanisms in the absence of source code availability. We experimentally evaluate the effectiveness and performance impact of our approach. We show that it stops all variants of low-level software attacks at a very low performance overhead, without impacting original program functionality

    Automatic parallelization in a binary rewriter

    No full text
    Abstract—Today, nearly all general-purpose computers are parallel, but nearly all software running on them is serial. However bridging this disconnect by manually rewriting source code in parallel is prohibitively expensive. Automatic parallelization technology is therefore an attractive alternative. We present a method to perform automatic parallelization in a binary rewriter. The input to the binary rewriter is the serial binary executable program and the output is a parallel binary executable. The advantages of parallelization in a binary rewriter versus a compiler include (i) compatibility with all compilers and languages; (ii) high economic feasibility from avoiding repeated compiler implementation; (iii) applicability to legacy binaries; and (iv) applicability to assembly-language programs. Adapting existing parallelizing compiler methods that work on source code to work on binary programs instead is a significant challenge. This is primarily because symbolic and array index information used in existing compiler parallelizers is not available in a binary. We show how to adapt existing parallelization methods to achieve equivalent parallelization from a binary without such information. Preliminary results using our x86 binary rewriter called SecondWrite on a suite of dense-matrix regular programs including the externally developed Polybench suite of benchmarks shows an average speedup of 5.1 from binary and 5.7 from source with 8 threads compared to the input serial binary on an x86 Xeon E5530 machine; and 14.7 from binary and 15.4 from source with 32 threads compared to the input serial binary on a SPARC T2. Such regular loops are an important component of scientific and multi-media workloads, and are even present to a limited extent in otherwise non-regular programs

    Anti-inflammatory profile of Aegle marmelos (L) Correa (Bilva) with special reference to young roots grown in different parts of India

    No full text
    Background: Aegle marmelos (Bilva) is being used in Ayurveda for the treatment of several inflammatory disorders. The plant is a member of a fixed dose combination of Dashamoola in Ayurveda. However, the usage of roots/root bark or stems is associated with sustainability concerns. Objectives: The present study is aimed to compare the anti-inflammatory properties of different extracts of young roots (year wise) and mature parts of Bilva plants collected from different geographical locations in India, so as to identify a sustainable source for Ayurvedic formulation. Materials and methods: A total of 191 extracts (petroleum ether, ethyl acetate, ethanol and aqueous) of roots, stems and leaves of A. marmelos (collected from Gujarat, Maharashtra, Odisha, Chhattisgarh, Karnataka and Andhra Pradesh region) were tested for anti-inflammatory effects in vitro on isolated target enzymes cyclooxygenase-1 (COX-1), cyclooxygenase-2 (COX-2) and 5-lipoxygenase (5-LOX), lymphocyte proliferation assay (LPA), cytokine profiling in LPS induced mouse macrophage (RAW 264.7) cell line and in vivo carrageenan induced paw edema in mice. Results: Of 191 extracts, 44 extracts showed COX-2 inhibition and 38 extracts showed COX-1 inhibition, while none showed 5-LOX inhibition. Cytokine analysis of the 44 extracts showing inhibition of COX-2 suggested that only 17 extracts modulated the cytokines by increasing the anti-inflammatory cytokine IL-2 and reducing the pro-inflammatory cytokines like IL-1β, MIP1-α and IL-6. The young (2 and 3 years) roots of Bilva plants from Gujarat and young (1 yr) roots from Odisha showed the most potent anti-inflammatory activity by suppressing the pro-inflammatory cytokines and inducing anti-inflammatory cytokines. These three extracts have also shown in vivo anti-inflammatory activity comparable to that in adult stem and root barks. Conclusion: The present study reveals that young roots of Bilva plants from Gujarat and Odisha region could form a sustainable source for use in Ayurvedic formulations with anti-inflammatory activities. The present study also indicates that the region in which the plants are grown and the age of the plants play an important role in exhibiting the anti-inflammatory effect. Keywords: Aegle marmelos, Inflammation, Ayurveda, Cyclooxygenase-1 & 2, 5-Lipoxygenase, Immunomodulatio
    corecore