55,301 research outputs found

    Born Again Neural Networks

    Get PDF
    Knowledge distillation (KD) consists of transferring knowledge from one machine learning model (the teacher}) to another (the student). Commonly, the teacher is a high-capacity model with formidable performance, while the student is more compact. By transferring knowledge, one hopes to benefit from the student's compactness. %we desire a compact model with performance close to the teacher's. We study KD from a new perspective: rather than compressing models, we train students parameterized identically to their teachers. Surprisingly, these {Born-Again Networks (BANs), outperform their teachers significantly, both on computer vision and language modeling tasks. Our experiments with BANs based on DenseNets demonstrate state-of-the-art performance on the CIFAR-10 (3.5%) and CIFAR-100 (15.5%) datasets, by validation error. Additional experiments explore two distillation objectives: (i) Confidence-Weighted by Teacher Max (CWTM) and (ii) Dark Knowledge with Permuted Predictions (DKPP). Both methods elucidate the essential components of KD, demonstrating a role of the teacher outputs on both predicted and non-predicted classes. We present experiments with students of various capacities, focusing on the under-explored case where students overpower teachers. Our experiments show significant advantages from transferring knowledge between DenseNets and ResNets in either direction.Comment: Published @ICML 201

    Distribution of Damages in Car Accidents throught the Use of Neural Networks

    Get PDF
    After a traffic accident the damage has to be fairly divided among the parties involved, and a ratio has to be determined. There are many precedents for this, and judges have developed catalogues suggesting ratios for common types of accidents. The problem that "every case is different," however, remains. Many cases have familiar aspects, but also unfamiliar ones. Even if a case is composed of several familiar aspects with established ratios, the question remains as to how these are to be figured into one ratio. The first thought would be to invent a mathematical formula, but such formulae are rigid and speculative. The body of law has grown organically and must not be forced into a sleek system. The distant consequences of using a mathematical formula cannot be foreseen; they might well be grossly unjust. I suggest using a neural network instead. Precedents may be fed into the network directly as learning patterns. This has the advantage that court rulings can be transferred directly and not via a formula. Future modifications in court rulings also can be adopted by the network. As far as the effect of the learning patterns on new cases is concerned, a relatively safe assumption is that they will fit in harmoniously with the precedents. This is due to the network's structure—a number of simple decisional units, which are interconnected, tune their activity to each other, thus achieving a state of equilibrium. When the conditions of such an equilibrium are translated back into the terms of the case, the solution can hardly be totally unjust

    Distributed representations accelerate evolution of adaptive behaviours

    Get PDF
    Animals with rudimentary innate abilities require substantial learning to transform those abilities into useful skills, where a skill can be considered as a set of sensory - motor associations. Using linear neural network models, it is proved that if skills are stored as distributed representations, then within- lifetime learning of part of a skill can induce automatic learning of the remaining parts of that skill. More importantly, it is shown that this " free- lunch'' learning ( FLL) is responsible for accelerated evolution of skills, when compared with networks which either 1) cannot benefit from FLL or 2) cannot learn. Specifically, it is shown that FLL accelerates the appearance of adaptive behaviour, both in its innate form and as FLL- induced behaviour, and that FLL can accelerate the rate at which learned behaviours become innate

    Training Passive Photonic Reservoirs with Integrated Optical Readout

    Full text link
    As Moore's law comes to an end, neuromorphic approaches to computing are on the rise. One of these, passive photonic reservoir computing, is a strong candidate for computing at high bitrates (> 10 Gbps) and with low energy consumption. Currently though, both benefits are limited by the necessity to perform training and readout operations in the electrical domain. Thus, efforts are currently underway in the photonic community to design an integrated optical readout, which allows to perform all operations in the optical domain. In addition to the technological challenge of designing such a readout, new algorithms have to be designed in order to train it. Foremost, suitable algorithms need to be able to deal with the fact that the actual on-chip reservoir states are not directly observable. In this work, we investigate several options for such a training algorithm and propose a solution in which the complex states of the reservoir can be observed by appropriately setting the readout weights, while iterating over a predefined input sequence. We perform numerical simulations in order to compare our method with an ideal baseline requiring full observability as well as with an established black-box optimization approach (CMA-ES).Comment: Accepted for publication in IEEE Transactions on Neural Networks and Learning Systems (TNNLS-2017-P-8539.R1), copyright 2018 IEEE. This research was funded by the EU Horizon 2020 PHRESCO Grant (Grant No. 688579) and the BELSPO IAP P7-35 program Photonics@be. 11 pages, 9 figure

    A new neural network technique for the design of multilayered microwave shielded bandpass filters

    Get PDF
    In this work, we propose a novel technique based on neural networks, for the design of microwave filters in shielded printed technology. The technique uses radial basis function neural networks to represent the non linear relations between the quality factors and coupling coefficients, with the geometrical dimensions of the resonators. The radial basis function neural networks are employed for the first time in the design task of shielded printed filters, and permit a fast and precise operation with only a limited set of training data. Thanks to a new cascade configuration, a set of two neural networks provide the dimensions of the complete filter in a fast and accurate way. To improve the calculation of the geometrical dimensions, the neural networks can take as inputs both electrical parameters and physical dimensions computed by other neural networks. The neural network technique is combined with gradient based optimization methods to further improve the response of the filters. Results are presented to demonstrate the usefulness of the proposed technique for the design of practical microwave printed coupled line and hairpin filters
    corecore