19 research outputs found

    ADA-GP: Accelerating DNN Training By Adaptive Gradient Prediction

    Full text link
    Neural network training is inherently sequential where the layers finish the forward propagation in succession, followed by the calculation and back-propagation of gradients (based on a loss function) starting from the last layer. The sequential computations significantly slow down neural network training, especially the deeper ones. Prediction has been successfully used in many areas of computer architecture to speed up sequential processing. Therefore, we propose ADA-GP, which uses gradient prediction adaptively to speed up deep neural network (DNN) training while maintaining accuracy. ADA-GP works by incorporating a small neural network to predict gradients for different layers of a DNN model. ADA-GP uses a novel tensor reorganization method to make it feasible to predict a large number of gradients. ADA-GP alternates between DNN training using backpropagated gradients and DNN training using predicted gradients. ADA-GP adaptively adjusts when and for how long gradient prediction is used to strike a balance between accuracy and performance. Last but not least, we provide a detailed hardware extension in a typical DNN accelerator to realize the speed up potential from gradient prediction. Our extensive experiments with fifteen DNN models show that ADA-GP can achieve an average speed up of 1.47X with similar or even higher accuracy than the baseline models. Moreover, it consumes, on average, 34% less energy due to reduced off-chip memory accesses compared to the baseline accelerator.Comment: 13 pages, 21 figures, 5 table

    Dynamically detecting and tolerating IF-Condition Data Races

    Full text link
    An IF-Condition Invariance Violation (ICIV) occurs when, after a thread has computed the control expression of an IF statement and while it is executing the THEN or ELSE clauses, another thread updates variables in the IF’s control expression. An ICIV can be easily detected, and is likely to be a sign of a concurrency bug in the code. Typically, the ICIV is caused by a data race, which we call IF-Condition Data Race (ICR). In this paper, we analyze the data races reported in the bug databases of popular software systems and show that ICRs occur relatively often. Then, we present two techniques to handle ICRs dynamically. They rely on simple code transformations and, in one case, additional hardware help. One of them (SW-IF) detects the races, while the other (HW-IF) detects and prevents them. We evaluate SW-IF and HW-IF using a variety of applica-tions. We show that these new techniques are effective at finding new data race bugs and run with low overhead. Specifically, HW-IF finds 5 new (unreported) race bugs and SW-IF finds 3 of them. In addition, 8-threaded executions of SPLASH-2 codes show that, on average, SW-IF adds 2 % execution overhead, while HW-IF adds less than 1%. 1

    Synthesizing Programs with Continuous Optimization

    Full text link
    Automatic software generation based on some specification is known as program synthesis. Most existing approaches formulate program synthesis as a search problem with discrete parameters. In this paper, we present a novel formulation of program synthesis as a continuous optimization problem and use a state-of-the-art evolutionary approach, known as Covariance Matrix Adaptation Evolution Strategy to solve the problem. We then propose a mapping scheme to convert the continuous formulation into actual programs. We compare our system, called GENESYS, with several recent program synthesis techniques (in both discrete and continuous domains) and show that GENESYS synthesizes more programs within a fixed time budget than those existing schemes. For example, for programs of length 10, GENESYS synthesizes 28% more programs than those existing schemes within the same time budget

    MERCURY: Accelerating DNN Training By Exploiting Input Similarity

    Full text link
    Deep Neural Networks (DNN) are computationally intensive to train. It consists of a large number of multidimensional dot products between many weights and input vectors. However, there can be significant similarity among input vectors. If one input vector is similar to another, its computations with the weights are similar to those of the other and, therefore, can be skipped by reusing the already-computed results. We propose a novel scheme, called MERCURY, to exploit input similarity during DNN training in a hardware accelerator. MERCURY uses Random Projection with Quantization (RPQ) to convert an input vector to a bit sequence, called Signature. A cache (MCACHE) stores signatures of recent input vectors along with the computed results. If the Signature of a new input vector matches that of an already existing vector in the MCACHE, the two vectors are found to have similarities. Therefore, the already-computed result is reused for the new vector. To the best of our knowledge, MERCURY is the first work that exploits input similarity using RPQ for accelerating DNN training in hardware. The paper presents a detailed design, workflow, and implementation of the MERCURY. Our experimental evaluation with twelve different deep learning models shows that MERCURY saves a significant number of computations and speeds up the model training by an average of 1.97X with an accuracy similar to the baseline system.Comment: 13 pages, 18 figures, 4 table

    A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions

    Get PDF
    The field of machine programming (MP), the automation of the development of software, is making notable research advances. This is, in part, due to the emergence of a wide range of novel techniques in machine learning. In this paper, we apply MP to the automation of software performance regression testing. A performance regression is a software performance degradation caused by a code change. We present AutoPerf–a novel approach to automate regression testing that utilizes three core techniques:(i) zero-positive learning,(ii) autoencoders, and (iii) hardware telemetry. We demonstrate AutoPerf’s generality and efficacy against 3 types of performance regressions across 10 real performance bugs in 7 benchmark and open-source programs. On average, AutoPerf exhibits 4% profiling overhead and accurately diagnoses more performance bugs than prior state-of-the-art approaches. Thus far, AutoPerf has produced no false negatives

    Learning Fitness Functions for Machine Programming

    Full text link
    The problem of automatic software generation is known as Machine Programming. In this work, we propose a framework based on genetic algorithms to solve this problem. Although genetic algorithms have been used successfully for many problems, one criticism is that hand-crafting its fitness function, the test that aims to effectively guide its evolution, can be notably challenging. Our framework presents a novel approach to learn the fitness function using neural networks to predict values of ideal fitness functions. We also augment the evolutionary process with a minimally intrusive search heuristic. This heuristic improves the framework's ability to discover correct programs from ones that are approximately correct and does so with negligible computational overhead. We compare our approach with several state-of-the-art program synthesis methods and demonstrate that it finds more correct programs with fewer candidate program generations

    Effects of different stages of maturity and Postharvest treatments on the extension of shelf life and quality of banana

    Get PDF
    An experiment was carried out in the Laboratories of the Departments of Horticulture and Biochemistry and Molecular Biology, Bangladesh Agricultural University, Mymensingh, during the period from 23rd April to 10th May 2015. The two factors experiment was conducted for extension of shelf life and quality of banana under different postharvest treatments. The first factor was stages of maturity which had three maturity stages, viz.: 1) hard green (S1), 2) pale green (S2) and optimum maturity stage (S3). The second factor was postharvest treatments, which had five treatments, viz.: 1) control (room temperature), 2) keeping fruits in perforated plastic bag, 3) keeping fruits in perforated plastic bag containing KMnO4, 4) fruits treated with hot water for 5 min at 50°C and then kept in plastic bag containing KMnO4, and 5) fruits precooling for 30 min at 5°C and then kept in plastic bag containing KMnO4. The pulp to peel ratio, total soluble solids, total sugar, reducing sugar, titratable acidity, were greater when fruits were harvested at optimum maturity stage than hard green stage. Total soluble solids, total sugar, reducing sugar were increased with storage duration, but increasing trend was slower when fruits were pre-cooling at 5°C for 30 min and kept in plastic bag containing KMnO4. The longest shelf life of banana fruits (19 days) was observed when hard green stages fruits pre-cooling at 5°C for 30 min and kept in plastic bag containing KMnO4 and the minimum shelf life (5.87 days) was observed in the combination of optimum maturity stage + control. So, it may be concluded that precooling for 30 min at 5°C and then kept in plastic bag containing KMnO4 treatment should be used for extending shelf life and quality of banana

    A Distance-Based Side-Channel Attack in Non Uniform Cache and Possible Defenses

    Full text link
    For a distributed last level cache (LLC) in a large multicore chip, the access time to one LLC bank can significantly differ from that to another. The disparity in access time is due to the different physical distances to the target LLC slices. In this paper, we demonstrate the possibility of exploiting such a distance-based side channel, by timing a vulnerable version of AES decryption and extracting part of the secret keys. We introduce several techniques to overcome the challenges of the attack, including using multiple attack threads to ensure LLC hits of the vulnerable memory locations and to time part of the decryption function. We further propose CAMOUFLAGE , an efficient, architectural defense for the proposed distance-based side-channel attack. At runtime, when a potentially leaking memory instruction is executed by a victim function, CAMOUFLAGE uses a combination of jitter and bypass mechanisms to eliminate any LLC hit time difference due to the distance and thereby, prevent the attack. We evaluate two versions of CAMOUFLAGE - CAMOUFLAGE JITTER and CAMOUFLAGE BYPASS using the Gem5 simulator with PARSEC and Rodinia benchmarks and show that they incur performance overheads of 14.14% or none over the baseline

    Large Language Models Based Automatic Synthesis of Software Specifications

    Full text link
    Software configurations play a crucial role in determining the behavior of software systems. In order to ensure safe and error-free operation, it is necessary to identify the correct configuration, along with their valid bounds and rules, which are commonly referred to as software specifications. As software systems grow in complexity and scale, the number of configurations and associated specifications required to ensure the correct operation can become large and prohibitively difficult to manipulate manually. Due to the fast pace of software development, it is often the case that correct software specifications are not thoroughly checked or validated within the software itself. Rather, they are frequently discussed and documented in a variety of external sources, including software manuals, code comments, and online discussion forums. Therefore, it is hard for the system administrator to know the correct specifications of configurations due to the lack of clarity, organization, and a centralized unified source to look at. To address this challenge, we propose SpecSyn a framework that leverages a state-of-the-art large language model to automatically synthesize software specifications from natural language sources. Our approach formulates software specification synthesis as a sequence-to-sequence learning problem and investigates the extraction of specifications from large contextual texts. This is the first work that uses a large language model for end-to-end specification synthesis from natural language texts. Empirical results demonstrate that our system outperforms prior the state-of-the-art specification synthesis tool by 21% in terms of F1 score and can find specifications from single as well as multiple sentences

    Genomic surveillance uncovers a pandemic clonal lineage of the wheat blast fungus

    Get PDF
    Wheat, one of the most important food crops, is threatened by a blast disease pandemic. Here, we show that a clonal lineage of the wheat blast fungus recently spread to Asia and Africa following two independent introductions from South America. Through a combination of genome analyses and laboratory experiments, we show that the decade-old blast pandemic lineage can be controlled by the Rmg8 disease resistance gene and is sensitive to strobilurin fungicides. However, we also highlight the potential of the pandemic clone to evolve fungicide-insensitive variants and sexually recombine with African lineages. This underscores the urgent need for genomic surveillance to track and mitigate the spread of wheat blast outside of South America and to guide preemptive wheat breeding for blast resistance
    corecore