1,212 research outputs found

    CES-KD: Curriculum-based Expert Selection for Guided Knowledge Distillation

    Full text link
    Knowledge distillation (KD) is an effective tool for compressing deep classification models for edge devices. However, the performance of KD is affected by the large capacity gap between the teacher and student networks. Recent methods have resorted to a multiple teacher assistant (TA) setting for KD, which sequentially decreases the size of the teacher model to relatively bridge the size gap between these models. This paper proposes a new technique called Curriculum Expert Selection for Knowledge Distillation (CES-KD) to efficiently enhance the learning of a compact student under the capacity gap problem. This technique is built upon the hypothesis that a student network should be guided gradually using stratified teaching curriculum as it learns easy (hard) data samples better and faster from a lower (higher) capacity teacher network. Specifically, our method is a gradual TA-based KD technique that selects a single teacher per input image based on a curriculum driven by the difficulty in classifying the image. In this work, we empirically verify our hypothesis and rigorously experiment with CIFAR-10, CIFAR-100, CINIC-10, and ImageNet datasets and show improved accuracy on VGG-like models, ResNets, and WideResNets architectures.Comment: ICPR202

    FMAS: Fast Multi-Objective SuperNet Architecture Search for Semantic Segmentation

    Full text link
    We present FMAS, a fast multi-objective neural architecture search framework for semantic segmentation. FMAS subsamples the structure and pre-trained parameters of DeepLabV3+, without fine-tuning, dramatically reducing training time during search. To further reduce candidate evaluation time, we use a subset of the validation dataset during the search. Only the final, Pareto non-dominated, candidates are ultimately fine-tuned using the complete training set. We evaluate FMAS by searching for models that effectively trade accuracy and computational cost on the PASCAL VOC 2012 dataset. FMAS finds competitive designs quickly, e.g., taking just 0.5 GPU days to discover a DeepLabV3+ variant that reduces FLOPs and parameters by 10%\% and 20%\% respectively, for less than 3%\% increased error. We also search on an edge device called GAP8 and use its latency as the metric. FMAS is capable of finding 2.2×\times faster network with 7.61%\% MIoU loss.Comment: Accepted as a full paper by the TinyML Research Symposium 202

    BD-KD: Balancing the Divergences for Online Knowledge Distillation

    Full text link
    Knowledge distillation (KD) has gained a lot of attention in the field of model compression for edge devices thanks to its effectiveness in compressing large powerful networks into smaller lower-capacity models. Online distillation, in which both the teacher and the student are learning collaboratively, has also gained much interest due to its ability to improve on the performance of the networks involved. The Kullback-Leibler (KL) divergence ensures the proper knowledge transfer between the teacher and student. However, most online KD techniques present some bottlenecks under the network capacity gap. By cooperatively and simultaneously training, the models the KL distance becomes incapable of properly minimizing the teacher's and student's distributions. Alongside accuracy, critical edge device applications are in need of well-calibrated compact networks. Confidence calibration provides a sensible way of getting trustworthy predictions. We propose BD-KD: Balancing of Divergences for online Knowledge Distillation. We show that adaptively balancing between the reverse and forward divergences shifts the focus of the training strategy to the compact student network without limiting the teacher network's learning process. We demonstrate that, by performing this balancing design at the level of the student distillation loss, we improve upon both performance accuracy and calibration of the compact student network. We conducted extensive experiments using a variety of network architectures and show improvements on multiple datasets including CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet. We illustrate the effectiveness of our approach through comprehensive comparisons and ablations with current state-of-the-art online and offline KD techniques

    Efficient Fine-Tuning of Compressed Language Models with Learners

    Full text link
    Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many prior works aim to improve inference efficiency via compression techniques, e.g., pruning, these works do not explicitly address the computational challenges of training to downstream tasks. We introduce Learner modules and priming, novel methods for fine-tuning that exploit the overparameterization of pre-trained language models to gain benefits in convergence speed and resource utilization. Learner modules navigate the double bind of 1) training efficiently by fine-tuning a subset of parameters, and 2) training effectively by ensuring quick convergence and high metric scores. Our results on DistilBERT demonstrate that learners perform on par with or surpass the baselines. Learners train 7x fewer parameters than state-of-the-art methods on GLUE. On CoLA, learners fine-tune 20% faster, and have significantly lower resource utilization.Comment: 8 pages, 9 figures, 2 tables, presented at ICML 2022 workshop on Hardware-Aware Efficient Training (HAET 2022

    Commissioning of the CMS High Level Trigger

    Get PDF
    The CMS experiment will collect data from the proton-proton collisions delivered by the Large Hadron Collider (LHC) at a centre-of-mass energy up to 14 TeV. The CMS trigger system is designed to cope with unprecedented luminosities and LHC bunch-crossing rates up to 40 MHz. The unique CMS trigger architecture only employs two trigger levels. The Level-1 trigger is implemented using custom electronics, while the High Level Trigger (HLT) is based on software algorithms running on a large cluster of commercial processors, the Event Filter Farm. We present the major functionalities of the CMS High Level Trigger system as of the starting of LHC beams operations in September 2008. The validation of the HLT system in the online environment with Monte Carlo simulated data and its commissioning during cosmic rays data taking campaigns are discussed in detail. We conclude with the description of the HLT operations with the first circulating LHC beams before the incident occurred the 19th September 2008

    Palaeoenvironmental control on distribution of crinoids in the Bathonian (Middle Jurassic) of England and France

    Get PDF
    Bulk sampling of a number of different marine and marginal marine lithofacies in the British Bathonian has allowed us to assess the palaeoenvironmental distribution of crinoids for the first time. Although remains are largely fragmentary, many species have been identified by comparison with articulated specimens from elsewhere, whilst the large and unbiased sample sizes allowed assessment of relative proportions of different taxa. Results indicate that distribution of crinoids well corresponds to particular facies. Ossicles of Chariocrinus and Balanocrinus dominate in deeper-water and lower-energy facies,with the former extending further into shallower-water facies than the latter. Isocrinus dominates in shallower water carbonate facies, accompanied by rarer comatulids, and was also present in the more marine parts of lagoons. Pentacrinites remains are abundant in very high-energy oolite shoal lithofacies. The presence of millericrinids within one, partly allochthonous lithofacies suggests the presence of an otherwise unknown hard substrate from which they have been transported. These results are compared to crinoid assemblages from other Mesozoic localities, and it is evident that the same morphological ad-aptations are present within crinoids from similar lithofacies throughout the Jurassic and Early Cretaceous

    Theory of Low-Mass Stars and Substellar Objects

    Full text link
    Since the discovery of the first bona-fide brown dwarfs and extra-solar planets in 1995, the field of low mass stars and substellar objects has considerably progressed, both from theoretical and observational viewpoints.Recent developments in the physics entering the modeling of these objects have led to significant improvements in the theory and to a better understanding of their mechanical and thermal properties. This theory can now be confronted with observations directly in various observational diagrams (color-color, color-magnitude, mass-magnitude, mass-spectral type), a stringent and unavoidable constraint which became possible only recently, with the generation of synthetic spectra. In this paper, we present the current state-of-the-art general theory of low-mass stars and sub-stellar objects, from one solar mass to one Jupiter mass, regarding primarily their interior structure and evolution. This review is a natural complement to the previous review on the atmosphere of low-mass stars and brown dwarfs (Allard et al 1997). Special attention is devoted to the comparison of the theory with various available observations. The contribution of low-mass stellar and sub-stellar objects to the Galactic mass budget is also analysed.Comment: 81 pages, Latex file, uses aasms4.sty, review for Annual Review of Astronomy and Astrophysics, vol. 38 (2000
    • 

    corecore