1,212 research outputs found
CES-KD: Curriculum-based Expert Selection for Guided Knowledge Distillation
Knowledge distillation (KD) is an effective tool for compressing deep
classification models for edge devices. However, the performance of KD is
affected by the large capacity gap between the teacher and student networks.
Recent methods have resorted to a multiple teacher assistant (TA) setting for
KD, which sequentially decreases the size of the teacher model to relatively
bridge the size gap between these models. This paper proposes a new technique
called Curriculum Expert Selection for Knowledge Distillation (CES-KD) to
efficiently enhance the learning of a compact student under the capacity gap
problem. This technique is built upon the hypothesis that a student network
should be guided gradually using stratified teaching curriculum as it learns
easy (hard) data samples better and faster from a lower (higher) capacity
teacher network. Specifically, our method is a gradual TA-based KD technique
that selects a single teacher per input image based on a curriculum driven by
the difficulty in classifying the image. In this work, we empirically verify
our hypothesis and rigorously experiment with CIFAR-10, CIFAR-100, CINIC-10,
and ImageNet datasets and show improved accuracy on VGG-like models, ResNets,
and WideResNets architectures.Comment: ICPR202
FMAS: Fast Multi-Objective SuperNet Architecture Search for Semantic Segmentation
We present FMAS, a fast multi-objective neural architecture search framework
for semantic segmentation. FMAS subsamples the structure and pre-trained
parameters of DeepLabV3+, without fine-tuning, dramatically reducing training
time during search. To further reduce candidate evaluation time, we use a
subset of the validation dataset during the search. Only the final, Pareto
non-dominated, candidates are ultimately fine-tuned using the complete training
set. We evaluate FMAS by searching for models that effectively trade accuracy
and computational cost on the PASCAL VOC 2012 dataset. FMAS finds competitive
designs quickly, e.g., taking just 0.5 GPU days to discover a DeepLabV3+
variant that reduces FLOPs and parameters by 10 and 20 respectively,
for less than 3 increased error. We also search on an edge device called
GAP8 and use its latency as the metric. FMAS is capable of finding 2.2
faster network with 7.61 MIoU loss.Comment: Accepted as a full paper by the TinyML Research Symposium 202
BD-KD: Balancing the Divergences for Online Knowledge Distillation
Knowledge distillation (KD) has gained a lot of attention in the field of
model compression for edge devices thanks to its effectiveness in compressing
large powerful networks into smaller lower-capacity models. Online
distillation, in which both the teacher and the student are learning
collaboratively, has also gained much interest due to its ability to improve on
the performance of the networks involved. The Kullback-Leibler (KL) divergence
ensures the proper knowledge transfer between the teacher and student. However,
most online KD techniques present some bottlenecks under the network capacity
gap. By cooperatively and simultaneously training, the models the KL distance
becomes incapable of properly minimizing the teacher's and student's
distributions. Alongside accuracy, critical edge device applications are in
need of well-calibrated compact networks. Confidence calibration provides a
sensible way of getting trustworthy predictions. We propose BD-KD: Balancing of
Divergences for online Knowledge Distillation. We show that adaptively
balancing between the reverse and forward divergences shifts the focus of the
training strategy to the compact student network without limiting the teacher
network's learning process. We demonstrate that, by performing this balancing
design at the level of the student distillation loss, we improve upon both
performance accuracy and calibration of the compact student network. We
conducted extensive experiments using a variety of network architectures and
show improvements on multiple datasets including CIFAR-10, CIFAR-100,
Tiny-ImageNet, and ImageNet. We illustrate the effectiveness of our approach
through comprehensive comparisons and ablations with current state-of-the-art
online and offline KD techniques
Efficient Fine-Tuning of Compressed Language Models with Learners
Fine-tuning BERT-based models is resource-intensive in memory, computation,
and time. While many prior works aim to improve inference efficiency via
compression techniques, e.g., pruning, these works do not explicitly address
the computational challenges of training to downstream tasks. We introduce
Learner modules and priming, novel methods for fine-tuning that exploit the
overparameterization of pre-trained language models to gain benefits in
convergence speed and resource utilization. Learner modules navigate the double
bind of 1) training efficiently by fine-tuning a subset of parameters, and 2)
training effectively by ensuring quick convergence and high metric scores. Our
results on DistilBERT demonstrate that learners perform on par with or surpass
the baselines. Learners train 7x fewer parameters than state-of-the-art methods
on GLUE. On CoLA, learners fine-tune 20% faster, and have significantly lower
resource utilization.Comment: 8 pages, 9 figures, 2 tables, presented at ICML 2022 workshop on
Hardware-Aware Efficient Training (HAET 2022
Commissioning of the CMS High Level Trigger
The CMS experiment will collect data from the proton-proton collisions
delivered by the Large Hadron Collider (LHC) at a centre-of-mass energy up to
14 TeV. The CMS trigger system is designed to cope with unprecedented
luminosities and LHC bunch-crossing rates up to 40 MHz. The unique CMS trigger
architecture only employs two trigger levels. The Level-1 trigger is
implemented using custom electronics, while the High Level Trigger (HLT) is
based on software algorithms running on a large cluster of commercial
processors, the Event Filter Farm. We present the major functionalities of the
CMS High Level Trigger system as of the starting of LHC beams operations in
September 2008. The validation of the HLT system in the online environment with
Monte Carlo simulated data and its commissioning during cosmic rays data taking
campaigns are discussed in detail. We conclude with the description of the HLT
operations with the first circulating LHC beams before the incident occurred
the 19th September 2008
Palaeoenvironmental control on distribution of crinoids in the Bathonian (Middle Jurassic) of England and France
Bulk sampling of a number of different marine and marginal marine lithofacies in the British Bathonian has allowed us to assess the palaeoenvironmental distribution of crinoids for the first time. Although remains are largely fragmentary, many species have been identified by comparison with articulated specimens from elsewhere, whilst the large and unbiased sample sizes allowed assessment of relative proportions of different taxa. Results indicate that distribution of crinoids well corresponds to particular facies. Ossicles of Chariocrinus and Balanocrinus dominate in deeper-water and lower-energy facies,with the former extending further into shallower-water facies than the latter. Isocrinus dominates in shallower water carbonate facies, accompanied by rarer comatulids, and was also present in the more marine parts of lagoons. Pentacrinites remains are abundant in very high-energy oolite shoal lithofacies. The presence of millericrinids within one, partly allochthonous lithofacies suggests the presence of an otherwise unknown hard substrate from which they have been transported. These results are compared to crinoid assemblages from other Mesozoic localities, and it is evident that the same morphological ad-aptations are present within crinoids from similar lithofacies throughout the Jurassic and Early Cretaceous
Theory of Low-Mass Stars and Substellar Objects
Since the discovery of the first bona-fide brown dwarfs and extra-solar
planets in 1995, the field of low mass stars and substellar objects has
considerably progressed, both from theoretical and observational
viewpoints.Recent developments in the physics entering the modeling of these
objects have led to significant improvements in the theory and to a better
understanding of their mechanical and thermal properties. This theory can now
be confronted with observations directly in various observational diagrams
(color-color, color-magnitude, mass-magnitude, mass-spectral type), a stringent
and unavoidable constraint which became possible only recently, with the
generation of synthetic spectra. In this paper, we present the current
state-of-the-art general theory of low-mass stars and sub-stellar objects, from
one solar mass to one Jupiter mass, regarding primarily their interior
structure and evolution. This review is a natural complement to the previous
review on the atmosphere of low-mass stars and brown dwarfs (Allard et al
1997). Special attention is devoted to the comparison of the theory with
various available observations. The contribution of low-mass stellar and
sub-stellar objects to the Galactic mass budget is also analysed.Comment: 81 pages, Latex file, uses aasms4.sty, review for Annual Review of
Astronomy and Astrophysics, vol. 38 (2000
- âŠ