12 research outputs found
Coupled Ensembles of Neural Networks
We investigate in this paper the architecture of deep convolutional networks.
Building on existing state of the art models, we propose a reconfiguration of
the model parameters into several parallel branches at the global network
level, with each branch being a standalone CNN. We show that this arrangement
is an efficient way to significantly reduce the number of parameters without
losing performance or to significantly improve the performance with the same
level of performance. The use of branches brings an additional form of
regularization. In addition to the split into parallel branches, we propose a
tighter coupling of these branches by placing the "fuse (averaging) layer"
before the Log-Likelihood and SoftMax layers during training. This gives
another significant performance improvement, the tighter coupling favouring the
learning of better representations, even at the level of the individual
branches. We refer to this branched architecture as "coupled ensembles". The
approach is very generic and can be applied with almost any DCNN architecture.
With coupled ensembles of DenseNet-BC and parameter budget of 25M, we obtain
error rates of 2.92%, 15.68% and 1.50% respectively on CIFAR-10, CIFAR-100 and
SVHN tasks. For the same budget, DenseNet-BC has error rate of 3.46%, 17.18%,
and 1.8% respectively. With ensembles of coupled ensembles, of DenseNet-BC
networks, with 50M total parameters, we obtain error rates of 2.72%, 15.13% and
1.42% respectively on these tasks
Multi-level Feature Fusion-based CNN for Local Climate Zone Classification from Sentinel-2 Images: Benchmark Results on the So2Sat LCZ42 Dataset
As a unique classification scheme for urban forms and functions, the local
climate zone (LCZ) system provides essential general information for any
studies related to urban environments, especially on a large scale. Remote
sensing data-based classification approaches are the key to large-scale mapping
and monitoring of LCZs. The potential of deep learning-based approaches is not
yet fully explored, even though advanced convolutional neural networks (CNNs)
continue to push the frontiers for various computer vision tasks. One reason is
that published studies are based on different datasets, usually at a regional
scale, which makes it impossible to fairly and consistently compare the
potential of different CNNs for real-world scenarios. This study is based on
the big So2Sat LCZ42 benchmark dataset dedicated to LCZ classification. Using
this dataset, we studied a range of CNNs of varying sizes. In addition, we
proposed a CNN to classify LCZs from Sentinel-2 images, Sen2LCZ-Net. Using this
base network, we propose fusing multi-level features using the extended
Sen2LCZ-Net-MF. With this proposed simple network architecture and the highly
competitive benchmark dataset, we obtain results that are better than those
obtained by the state-of-the-art CNNs, while requiring less computation with
fewer layers and parameters. Large-scale LCZ classification examples of
completely unseen areas are presented, demonstrating the potential of our
proposed Sen2LCZ-Net-MF as well as the So2Sat LCZ42 dataset. We also
intensively investigated the influence of network depth and width and the
effectiveness of the design choices made for Sen2LCZ-Net-MF. Our work will
provide important baselines for future CNN-based algorithm developments for
both LCZ classification and other urban land cover land use classification
Born Again Neural Networks
Knowledge distillation (KD) consists of transferring knowledge from one
machine learning model (the teacher}) to another (the student). Commonly, the
teacher is a high-capacity model with formidable performance, while the student
is more compact. By transferring knowledge, one hopes to benefit from the
student's compactness. %we desire a compact model with performance close to the
teacher's. We study KD from a new perspective: rather than compressing models,
we train students parameterized identically to their teachers. Surprisingly,
these {Born-Again Networks (BANs), outperform their teachers significantly,
both on computer vision and language modeling tasks. Our experiments with BANs
based on DenseNets demonstrate state-of-the-art performance on the CIFAR-10
(3.5%) and CIFAR-100 (15.5%) datasets, by validation error. Additional
experiments explore two distillation objectives: (i) Confidence-Weighted by
Teacher Max (CWTM) and (ii) Dark Knowledge with Permuted Predictions (DKPP).
Both methods elucidate the essential components of KD, demonstrating a role of
the teacher outputs on both predicted and non-predicted classes. We present
experiments with students of various capacities, focusing on the under-explored
case where students overpower teachers. Our experiments show significant
advantages from transferring knowledge between DenseNets and ResNets in either
direction.Comment: Published @ICML 201