4 research outputs found
Parsimonious Deep Learning: A Differential Inclusion Approach with Global Convergence
Over-parameterization is ubiquitous nowadays in training neural networks to
benefit both optimization in seeking global optima and generalization in
reducing prediction error. However, compressive networks are desired in many
real world applications and direct training of small networks may be trapped in
local optima. In this paper, instead of pruning or distilling an
over-parameterized model to compressive ones, we propose a parsimonious
learning approach based on differential inclusions of inverse scale spaces,
that generates a family of models from simple to complex ones with a better
efficiency and interpretability than stochastic gradient descent in exploring
the model space. It enjoys a simple discretization, the Split Linearized
Bregman Iterations, with provable global convergence that from any
initializations, algorithmic iterations converge to a critical point of
empirical risks. One may exploit the proposed method to boost the complexity of
neural networks progressively. Numerical experiments with MNIST, Cifar-10/100,
and ImageNet are conducted to show the method is promising in training large
scale models with a favorite interpretability.Comment: 25 pages, 7 figure
Leveraging both Lesion Features and Procedural Bias in Neuroimaging: An Dual-Task Split dynamics of inverse scale space
The prediction and selection of lesion features are two important tasks in
voxel-based neuroimage analysis. Existing multivariate learning models take two
tasks equivalently and optimize simultaneously. However, in addition to lesion
features, we observe that there is another type of feature, which is commonly
introduced during the procedure of preprocessing steps, which can improve the
prediction result. We call such a type of feature as procedural bias.
Therefore, in this paper, we propose that the features/voxels in neuroimage
data are consist of three orthogonal parts: lesion features, procedural bias,
and null features. To stably select lesion features and leverage procedural
bias into prediction, we propose an iterative algorithm (termed GSplit LBI) as
a discretization of differential inclusion of inverse scale space, which is the
combination of Variable Splitting scheme and Linearized Bregman Iteration
(LBI). Specifically, with a variable the splitting term, two estimators are
introduced and split apart, i.e. one is for feature selection (the sparse
estimator) and the other is for prediction (the dense estimator). Implemented
with Linearized Bregman Iteration (LBI), the solution path of both estimators
can be returned with different sparsity levels on the sparse estimator for the
selection of lesion features. Besides, the dense the estimator can additionally
leverage procedural bias to further improve prediction results. To test the
efficacy of our method, we conduct experiments on the simulated study and
Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The validity and
the benefit of our model can be shown by the improvement of prediction results
and the interpretability of visualized procedural bias and lesion features.Comment: Thanks to Xinwei's girlfriend Yue Cao, for her love and suppor
iSplit LBI: Individualized Partial Ranking with Ties via Split LBI
Due to the inherent uncertainty of data, the problem of predicting partial
ranking from pairwise comparison data with ties has attracted increasing
interest in recent years. However, in real-world scenarios, different
individuals often hold distinct preferences. It might be misleading to merely
look at a global partial ranking while ignoring personal diversity. In this
paper, instead of learning a global ranking which is agreed with the consensus,
we pursue the tie-aware partial ranking from an individualized perspective.
Particularly, we formulate a unified framework which not only can be used for
individualized partial ranking prediction, but also be helpful for abnormal
user selection. This is realized by a variable splitting-based algorithm called
\ilbi. Specifically, our algorithm generates a sequence of estimations with a
regularization path, where both the hyperparameters and model parameters are
updated. At each step of the path, the parameters can be decomposed into three
orthogonal parts, namely, abnormal signals, personalized signals and random
noise. The abnormal signals can serve the purpose of abnormal user selection,
while the abnormal signals and personalized signals together are mainly
responsible for individual partial ranking prediction. Extensive experiments on
simulated and real-world datasets demonstrate that our new approach
significantly outperforms state-of-the-art alternatives. The code is now
availiable at https://github.com/qianqianxu010/NeurIPS2019-iSplitLBI.Comment: Accepted by NeurIPS 201
DessiLBI: Exploring Structural Sparsity of Deep Networks via Differential Inclusion Paths
Over-parameterization is ubiquitous nowadays in training neural networks to
benefit both optimization in seeking global optima and generalization in
reducing prediction error. However, compressive networks are desired in many
real world applications and direct training of small networks may be trapped in
local optima. In this paper, instead of pruning or distilling
over-parameterized models to compressive ones, we propose a new approach based
on differential inclusions of inverse scale spaces. Specifically, it generates
a family of models from simple to complex ones that couples a pair of
parameters to simultaneously train over-parameterized deep models and
structural sparsity on weights of fully connected and convolutional layers.
Such a differential inclusion scheme has a simple discretization, proposed as
Deep structurally splitting Linearized Bregman Iteration (DessiLBI), whose
global convergence analysis in deep learning is established that from any
initializations, algorithmic iterations converge to a critical point of
empirical risks. Experimental evidence shows that DessiLBI achieve comparable
and even better performance than the competitive optimizers in exploring the
structural sparsity of several widely used backbones on the benchmark datasets.
Remarkably, with early stopping, DessiLBI unveils "winning tickets" in early
epochs: the effective sparse structure with comparable test accuracy to fully
trained over-parameterized models.Comment: conference , 23 pages https://github.com/corwinliu9669/dS2LB