178 research outputs found
Receptive fields optimization in deep learning for enhanced interpretability, diversity, and resource efficiency.
In both supervised and unsupervised learning settings, deep neural networks (DNNs) are known to perform hierarchical and discriminative representation of data. They are capable of automatically extracting excellent hierarchy of features from raw data without the need for manual feature engineering. Over the past few years, the general trend has been that DNNs have grown deeper and larger, amounting to huge number of final parameters and highly nonlinear cascade of features, thus improving the flexibility and accuracy of resulting models. In order to account for the scale, diversity and the difficulty of data DNNs learn from, the architectural complexity and the excessive number of weights are often deliberately built in into their design. This flexibility and performance usually come with high computational and memory demands both during training and inference. In addition, insight into the mappings DNN models perform and human ability to understand them still remain very limited. This dissertation addresses some of these limitations by balancing three conflicting objectives: computational/ memory demands, interpretability, and accuracy. This dissertation first introduces some unsupervised feature learning methods in a broader context of dictionary learning. It also sets the tone for deep autoencoder learning and constraints for data representations in light of removing some of the aforementioned bottlenecks such as the feature interpretability of deep learning models with nonnegativity constraints on receptive fields. In addition, the two main classes of solution to the drawbacks associated with overparameterization/ over-complete representation in deep learning models are also presented. Subsequently, two novel methods, one for each solution class, are presented to address the problems resulting from over-complete representation exhibited by most deep learning models. The first method is developed to achieve inference-cost-efficient models via elimination of redundant features with negligible deterioration of prediction accuracy. This is important especially for deploying deep learning models into resource-limited portable devices. The second method aims at diversifying the features of DNNs in the learning phase to improve their performance without undermining their size and capacity. Lastly, feature diversification is considered to stabilize adversarial learning and extensive experimental outcomes show that these methods have the potential of advancing the current state-of-the-art on different learning tasks and benchmark datasets
Exemplar-Free Continual Transformer with Convolutions
Continual Learning (CL) involves training a machine learning model in a
sequential manner to learn new information while retaining previously learned
tasks without the presence of previous training data. Although there has been
significant interest in CL, most recent CL approaches in computer vision have
focused on convolutional architectures only. However, with the recent success
of vision transformers, there is a need to explore their potential for CL.
Although there have been some recent CL approaches for vision transformers,
they either store training instances of previous tasks or require a task
identifier during test time, which can be limiting. This paper proposes a new
exemplar-free approach for class/task incremental learning called ConTraCon,
which does not require task-id to be explicitly present during inference and
avoids the need for storing previous training instances. The proposed approach
leverages the transformer architecture and involves re-weighting the key,
query, and value weights of the multi-head self-attention layers of a
transformer trained on a similar task. The re-weighting is done using
convolution, which enables the approach to maintain low parameter requirements
per task. Additionally, an image augmentation-based entropic task
identification approach is used to predict tasks without requiring task-ids
during inference. Experiments on four benchmark datasets demonstrate that the
proposed approach outperforms several competitive approaches while requiring
fewer parameters.Comment: Accepted in ICCV 202
A Survey on Dropout Methods and Experimental Verification in Recommendation
Overfitting is a common problem in machine learning, which means the model
too closely fits the training data while performing poorly in the test data.
Among various methods of coping with overfitting, dropout is one of the
representative ways. From randomly dropping neurons to dropping neural
structures, dropout has achieved great success in improving model performances.
Although various dropout methods have been designed and widely applied in past
years, their effectiveness, application scenarios, and contributions have not
been comprehensively summarized and empirically compared by far. It is the
right time to make a comprehensive survey.
In this paper, we systematically review previous dropout methods and classify
them into three major categories according to the stage where dropout operation
is performed. Specifically, more than seventy dropout methods published in top
AI conferences or journals (e.g., TKDE, KDD, TheWebConf, SIGIR) are involved.
The designed taxonomy is easy to understand and capable of including new
dropout methods. Then, we further discuss their application scenarios,
connections, and contributions. To verify the effectiveness of distinct dropout
methods, extensive experiments are conducted on recommendation scenarios with
abundant heterogeneous information. Finally, we propose some open problems and
potential research directions about dropout that worth to be further explored.Comment: 26 page
Bayesian Continual Learning via Spiking Neural Networks
Among the main features of biological intelligence are energy efficiency,
capacity for continual adaptation, and risk management via uncertainty
quantification. Neuromorphic engineering has been thus far mostly driven by the
goal of implementing energy-efficient machines that take inspiration from the
time-based computing paradigm of biological brains. In this paper, we take
steps towards the design of neuromorphic systems that are capable of adaptation
to changing learning tasks, while producing well-calibrated uncertainty
quantification estimates. To this end, we derive online learning rules for
spiking neural networks (SNNs) within a Bayesian continual learning framework.
In it, each synaptic weight is represented by parameters that quantify the
current epistemic uncertainty resulting from prior knowledge and observed data.
The proposed online rules update the distribution parameters in a streaming
fashion as data are observed. We instantiate the proposed approach for both
real-valued and binary synaptic weights. Experimental results using Intel's
Lava platform show the merits of Bayesian over frequentist learning in terms of
capacity for adaptation and uncertainty quantification.Comment: Accepted for publication in Frontiers in Computational Neuroscienc
- …