125,710 research outputs found
Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons
An activation boundary for a neuron refers to a separating hyperplane that
determines whether the neuron is activated or deactivated. It has been long
considered in neural networks that the activations of neurons, rather than
their exact output values, play the most important role in forming
classification friendly partitions of the hidden feature space. However, as far
as we know, this aspect of neural networks has not been considered in the
literature of knowledge transfer. In this paper, we propose a knowledge
transfer method via distillation of activation boundaries formed by hidden
neurons. For the distillation, we propose an activation transfer loss that has
the minimum value when the boundaries generated by the student coincide with
those by the teacher. Since the activation transfer loss is not differentiable,
we design a piecewise differentiable loss approximating the activation transfer
loss. By the proposed method, the student learns a separating boundary between
activation region and deactivation region formed by each neuron in the teacher.
Through the experiments in various aspects of knowledge transfer, it is
verified that the proposed method outperforms the current state-of-the-art.Comment: Accepted to AAAI 201
Analysis of dropout learning regarded as ensemble learning
Deep learning is the state-of-the-art in fields such as visual object
recognition and speech recognition. This learning uses a large number of
layers, huge number of units, and connections. Therefore, overfitting is a
serious problem. To avoid this problem, dropout learning is proposed. Dropout
learning neglects some inputs and hidden units in the learning process with a
probability, p, and then, the neglected inputs and hidden units are combined
with the learned network to express the final output. We find that the process
of combining the neglected hidden units with the learned network can be
regarded as ensemble learning, so we analyze dropout learning from this point
of view.Comment: 9 pages, 8 figures, submitted to Conferenc
Theory of Interacting Neural Networks
In this contribution we give an overview over recent work on the theory of
interacting neural networks. The model is defined in Section 2. The typical
teacher/student scenario is considered in Section 3. A static teacher network
is presenting training examples for an adaptive student network. In the case of
multilayer networks, the student shows a transition from a symmetric state to
specialisation. Neural networks can also generate a time series. Training on
time series and predicting it are studied in Section 4. When a network is
trained on its own output, it is interacting with itself. Such a scenario has
implications on the theory of prediction algorithms, as discussed in Section 5.
When a system of networks is trained on its minority decisions, it may be
considered as a model for competition in closed markets, see Section 6. In
Section 7 we consider two mutually interacting networks. A novel phenomenon is
observed: synchronisation by mutual learning. In Section 8 it is shown, how
this phenomenon can be applied to cryptography: Generation of a secret key over
a public channel.Comment: Contribution to Networks, ed. by H.G. Schuster and S. Bornholdt, to
be published by Wiley VC
- …