199 research outputs found
Incremental learning with respect to new incoming input attributes
Neural networks are generally exposed to a dynamic environment where the training patterns or the input attributes (features) will likely be introduced into the current domain incrementally. This paper considers the situation where a new set of input attributes must be considered and added into the existing neural network. The conventional method is to discard the existing network and redesign one from scratch. This approach wastes the old knowledge and the previous effort. In order to reduce computational time, improve generalization accuracy, and enhance intelligence of the learned models, we present ILIA algorithms (namely ILIA1, ILIA2, ILIA3, ILIA4 and ILIA5) capable of Incremental Learning in terms of Input Attributes. Using the ILIA algorithms, when new input attributes are introduced into the original problem, the existing neural network can be retained and a new sub-network is constructed and trained incrementally. The new sub-network and the old one are merged later to form a new network for the changed problem. In addition, ILIA algorithms have the ability to decide whether the new incoming input attributes are relevant to the output and consistent with the existing input attributes or not and suggest to accept or reject them. Experimental results show that the ILIA algorithms are efficient and effective both for the classification and regression problems
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning
Understanding the global optimality in deep learning (DL) has been attracting
more and more attention recently. Conventional DL solvers, however, have not
been developed intentionally to seek for such global optimality. In this paper
we propose a novel approximation algorithm, BPGrad, towards optimizing deep
models globally via branch and pruning. Our BPGrad algorithm is based on the
assumption of Lipschitz continuity in DL, and as a result it can adaptively
determine the step size for current gradient given the history of previous
updates, wherein theoretically no smaller steps can achieve the global
optimality. We prove that, by repeating such branch-and-pruning procedure, we
can locate the global optimality within finite iterations. Empirically an
efficient solver based on BPGrad for DL is proposed as well, and it outperforms
conventional DL solvers such as Adagrad, Adadelta, RMSProp, and Adam in the
tasks of object recognition, detection, and segmentation
Elimination of All Bad Local Minima in Deep Learning
In this paper, we theoretically prove that adding one special neuron per
output unit eliminates all suboptimal local minima of any deep neural network,
for multi-class classification, binary classification, and regression with an
arbitrary loss function, under practical assumptions. At every local minimum of
any deep neural network with these added neurons, the set of parameters of the
original neural network (without added neurons) is guaranteed to be a global
minimum of the original neural network. The effects of the added neurons are
proven to automatically vanish at every local minimum. Moreover, we provide a
novel theoretical characterization of a failure mode of eliminating suboptimal
local minima via an additional theorem and several examples. This paper also
introduces a novel proof technique based on the perturbable gradient basis
(PGB) necessary condition of local minima, which provides new insight into the
elimination of local minima and is applicable to analyze various models and
transformations of objective functions beyond the elimination of local minima.Comment: Accepted to appear in AISTATS 202
- …