160 research outputs found

    Editor\u27s Note on the Targeted Killing of Anwar Al-Aulaqi

    Get PDF

    United Kingdom Libraries during World War II

    Get PDF
    Cultural attacks, especially attacks on books, have been common place during conflict throughout history (Stubbings, 1993). Libraries in particular have been targeted since books and libraries are symbols of cultures (Valencia, 2002). Nazi Germany was not an exception to attacks on culture before and during World War II (Figure 1). A total war had begun under the Nazis which meant that no area of society was exempt from attack (Valencia, 2002)

    Effects of Head Formation and Heat Treatment on the Mechanical Properties of Connecting Rod Bolts

    Get PDF
    Oliver Racing Parts (ORP; Charlevoix, Michigan) is looking to optimize their manufacturing process for high-strength connecting rod bolts. A high yield strength is desired for the bolts because deformation would result in catastrophic engine failure. The bolts were made of H11, a chromium hot-work tool steel; and MLX17, a precipitation hardenable stainless steel. Tensile testing was performed to determine the tensile and yield strengths of the bolts. Fracture surfaces were imaged via scanning electron microscopy to characterize the failure modes. To observe the effects of bolt heading on microstructure and bolt strength, two batches of MLX17 were prepared; one batch being headed then aged (Group A); the other batch being headed, solution annealed, and then aged (Group B). These bolts were compared to H11 bolts to determine their viability for use, with the results being in the order of highest to lowest yield strength: H11 (272 ksi), MLX17 Treatment B (250 ksi), and MLX17 Treatment A (235 ksi). In the order of highest to lowest tensile strength: H11 (300 ksi), MLX17 Group B (255 ksi), MLX17 Group A (238 ksi). It is suggested that the bolt heading process is causing some overaging in the MLX17 samples, shown by the increase in strength when strain and aging from the heading process are undone through heat treatment. H11 bolts were the strongest tested. Recommendations are to not replace H11 bolts with MLX17 due to a decrease in strength

    Dynamic Masking Rate Schedules for MLM Pretraining

    Full text link
    Most works on transformers trained with the Masked Language Modeling (MLM) objective use the original BERT model's fixed masking rate of 15%. Our work instead dynamically schedules the masking ratio throughout training. We found that linearly decreasing the masking rate from 30% to 15% over the course of pretraining improves average GLUE accuracy by 0.46% in BERT-base, compared to a standard 15% fixed rate. Further analyses demonstrate that the gains from scheduling come from being exposed to both high and low masking rate regimes. Our results demonstrate that masking rate scheduling is a simple way to improve the quality of masked language models and achieve up to a 1.89x speedup in pretraining

    On the special role of class-selective neurons in early training

    Full text link
    It is commonly observed that deep networks trained for classification exhibit class-selective neurons in their early and intermediate layers. Intriguingly, recent studies have shown that these class-selective neurons can be ablated without deteriorating network function. But if class-selective neurons are not necessary, why do they exist? We attempt to answer this question in a series of experiments on ResNet-50s trained on ImageNet. We first show that class-selective neurons emerge during the first few epochs of training, before receding rapidly but not completely; this suggests that class-selective neurons found in trained networks are in fact vestigial remains of early training. With single-neuron ablation experiments, we then show that class-selective neurons are important for network function in this early phase of training. We also observe that the network is close to a linear regime in this early phase; we thus speculate that class-selective neurons appear early in training as quasi-linear shortcut solutions to the classification task. Finally, in causal experiments where we regularize against class selectivity at different points in training, we show that the presence of class-selective neurons early in training is critical to the successful training of the network; in contrast, class-selective neurons can be suppressed later in training with little effect on final accuracy. It remains to be understood by which mechanism the presence of class-selective neurons in the early phase of training contributes to the successful training of networks

    Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation

    Full text link
    Methods for improving the efficiency of deep network training (i.e. the resources required to achieve a given level of model quality) are of immediate benefit to deep learning practitioners. Distillation is typically used to compress models or improve model quality, but it's unclear if distillation actually improves training efficiency. Can the quality improvements of distillation be converted into training speed-ups, or do they simply increase final model quality with no resource savings? We conducted a series of experiments to investigate whether and how distillation can be used to accelerate training using ResNet-50 trained on ImageNet and BERT trained on C4 with a masked language modeling objective and evaluated on GLUE, using common enterprise hardware (8x NVIDIA A100). We found that distillation can speed up training by up to 1.96x in ResNet-50 trained on ImageNet and up to 1.42x on BERT when evaluated on GLUE. Furthermore, distillation for BERT yields optimal results when it is only performed for the first 20-50% of training. We also observed that training with distillation is almost always more efficient than training without distillation, even when using the poorest-quality model as a teacher, in both ResNet-50 and BERT. Finally, we found that it's possible to gain the benefit of distilling from an ensemble of teacher models, which has O(n) runtime cost, by randomly sampling a single teacher from the pool of teacher models on each step, which only has a O(1) runtime cost. Taken together, these results show that distillation can substantially improve training efficiency in both image classification and language modeling, and that a few simple optimizations to distillation protocols can further enhance these efficiency improvements
    • …
    corecore