55,301 research outputs found
Born Again Neural Networks
Knowledge distillation (KD) consists of transferring knowledge from one
machine learning model (the teacher}) to another (the student). Commonly, the
teacher is a high-capacity model with formidable performance, while the student
is more compact. By transferring knowledge, one hopes to benefit from the
student's compactness. %we desire a compact model with performance close to the
teacher's. We study KD from a new perspective: rather than compressing models,
we train students parameterized identically to their teachers. Surprisingly,
these {Born-Again Networks (BANs), outperform their teachers significantly,
both on computer vision and language modeling tasks. Our experiments with BANs
based on DenseNets demonstrate state-of-the-art performance on the CIFAR-10
(3.5%) and CIFAR-100 (15.5%) datasets, by validation error. Additional
experiments explore two distillation objectives: (i) Confidence-Weighted by
Teacher Max (CWTM) and (ii) Dark Knowledge with Permuted Predictions (DKPP).
Both methods elucidate the essential components of KD, demonstrating a role of
the teacher outputs on both predicted and non-predicted classes. We present
experiments with students of various capacities, focusing on the under-explored
case where students overpower teachers. Our experiments show significant
advantages from transferring knowledge between DenseNets and ResNets in either
direction.Comment: Published @ICML 201
Distribution of Damages in Car Accidents throught the Use of Neural Networks
After a traffic accident the damage has to be fairly divided
among the parties involved, and a ratio has to be determined.
There are many precedents for this, and judges have developed catalogues
suggesting ratios for common types of accidents.
The problem that "every case is different," however, remains.
Many cases have familiar aspects, but also unfamiliar ones. Even if
a case is composed of several familiar aspects with established ratios,
the question remains as to how these are to be figured into one
ratio. The first thought would be to invent a mathematical
formula, but such formulae are rigid and speculative. The body of
law has grown organically and must not be forced into a sleek system.
The distant consequences of using a mathematical formula
cannot be foreseen; they might well be grossly unjust.
I suggest using a neural network instead. Precedents may be
fed into the network directly as learning patterns. This has the
advantage that court rulings can be transferred directly and not via
a formula. Future modifications in court rulings also can be
adopted by the network. As far as the effect of the learning patterns
on new cases is concerned, a relatively safe assumption is that
they will fit in harmoniously with the precedents. This is due to
the network's structure—a number of simple decisional units,
which are interconnected, tune their activity to each other, thus
achieving a state of equilibrium. When the conditions of such an
equilibrium are translated back into the terms of the case, the solution
can hardly be totally unjust
Distributed representations accelerate evolution of adaptive behaviours
Animals with rudimentary innate abilities require substantial learning to transform those abilities into useful skills, where a skill can be considered as a set of sensory - motor associations. Using linear neural network models, it is proved that if skills are stored as distributed representations, then within- lifetime learning of part of a skill can induce automatic learning of the remaining parts of that skill. More importantly, it is shown that this " free- lunch'' learning ( FLL) is responsible for accelerated evolution of skills, when compared with networks which either 1) cannot benefit from FLL or 2) cannot learn. Specifically, it is shown that FLL accelerates the appearance of adaptive behaviour, both in its innate form and as FLL- induced behaviour, and that FLL can accelerate the rate at which learned behaviours become innate
Training Passive Photonic Reservoirs with Integrated Optical Readout
As Moore's law comes to an end, neuromorphic approaches to computing are on
the rise. One of these, passive photonic reservoir computing, is a strong
candidate for computing at high bitrates (> 10 Gbps) and with low energy
consumption. Currently though, both benefits are limited by the necessity to
perform training and readout operations in the electrical domain. Thus, efforts
are currently underway in the photonic community to design an integrated
optical readout, which allows to perform all operations in the optical domain.
In addition to the technological challenge of designing such a readout, new
algorithms have to be designed in order to train it. Foremost, suitable
algorithms need to be able to deal with the fact that the actual on-chip
reservoir states are not directly observable. In this work, we investigate
several options for such a training algorithm and propose a solution in which
the complex states of the reservoir can be observed by appropriately setting
the readout weights, while iterating over a predefined input sequence. We
perform numerical simulations in order to compare our method with an ideal
baseline requiring full observability as well as with an established black-box
optimization approach (CMA-ES).Comment: Accepted for publication in IEEE Transactions on Neural Networks and
Learning Systems (TNNLS-2017-P-8539.R1), copyright 2018 IEEE. This research
was funded by the EU Horizon 2020 PHRESCO Grant (Grant No. 688579) and the
BELSPO IAP P7-35 program Photonics@be. 11 pages, 9 figure
A new neural network technique for the design of multilayered microwave shielded bandpass filters
In this work, we propose a novel technique based on neural networks, for the design of microwave filters in shielded printed technology. The technique uses radial basis function neural networks to represent the non linear relations between the quality factors and coupling coefficients, with the geometrical dimensions of the resonators. The radial basis function neural networks are employed for the first time in the design task of shielded printed filters, and permit a fast and precise operation with only a limited set of training data. Thanks to a new cascade configuration, a set of two neural networks provide the dimensions of the complete filter in a fast and accurate way. To improve the calculation of the geometrical dimensions, the neural networks can take as inputs both electrical parameters and physical dimensions computed by other neural networks. The neural network technique is combined with gradient based optimization methods to further improve the response of the filters. Results are presented to demonstrate the usefulness of the proposed technique for the design of practical microwave printed coupled line and hairpin filters
- …