219,519 research outputs found
Representation Learning with Fine-grained Patterns
With the development of computational power and techniques for data
collection, deep learning demonstrates a superior performance over most of
existing algorithms on benchmark data sets. Many efforts have been devoted to
studying the mechanism of deep learning. One important observation is that deep
learning can learn the discriminative patterns from raw materials directly in a
task-dependent manner. Therefore, the representations obtained by deep learning
outperform hand-crafted features significantly. However, those patterns are
often learned from super-class labels due to a limited availability of
fine-grained labels, while fine-grained patterns are desired in many real-world
applications such as visual search in online shopping. To mitigate the
challenge, we propose an algorithm to learn the fine-grained patterns
sufficiently when only super-class labels are available. The effectiveness of
our method can be guaranteed with the theoretical analysis. Extensive
experiments on real-world data sets demonstrate that the proposed method can
significantly improve the performance on target tasks corresponding to
fine-grained classes, when only super-class information is available for
training
On the Informativeness of Supervision Signals
Learning transferable representations by training a classifier is a
well-established technique in deep learning (e.g., ImageNet pretraining), but
it remains an open theoretical question why this kind of task-specific
pre-training should result in ''good'' representations that actually capture
the underlying structure of the data. We conduct an information-theoretic
analysis of several commonly-used supervision signals from contrastive learning
and classification to determine how they contribute to representation learning
performance and how the dynamics of learning are affected by training
parameters such as the number of labels, classes, and dimensions in the
training dataset. We validate these results empirically in a series of
simulations and conduct a cost-benefit analysis to establish a tradeoff curve
that enables users to optimize the cost of supervising representation learning
on their own datasets
Deep Multiview Clustering by Contrasting Cluster Assignments
Multiview clustering (MVC) aims to reveal the underlying structure of
multiview data by categorizing data samples into clusters. Deep learning-based
methods exhibit strong feature learning capabilities on large-scale datasets.
For most existing deep MVC methods, exploring the invariant representations of
multiple views is still an intractable problem. In this paper, we propose a
cross-view contrastive learning (CVCL) method that learns view-invariant
representations and produces clustering results by contrasting the cluster
assignments among multiple views. Specifically, we first employ deep
autoencoders to extract view-dependent features in the pretraining stage. Then,
a cluster-level CVCL strategy is presented to explore consistent semantic label
information among the multiple views in the fine-tuning stage. Thus, the
proposed CVCL method is able to produce more discriminative cluster assignments
by virtue of this learning strategy. Moreover, we provide a theoretical
analysis of soft cluster assignment alignment. Extensive experimental results
obtained on several datasets demonstrate that the proposed CVCL method
outperforms several state-of-the-art approaches.Comment: 10 pages, 7 figure
Unsupervised Heterogeneous Coupling Learning for Categorical Representation.
Complex categorical data is often hierarchically coupled with heterogeneous relationships between attributes and attribute values and the couplings between objects. Such value-to-object couplings are heterogeneous with complementary and inconsistent interactions and distributions. Limited research exists on unlabeled categorical data representations, ignores the heterogeneous and hierarchical couplings, underestimates data characteristics and complexities, and overuses redundant information, etc. Deep representation learning of unlabeled categorical data is challenging, overseeing such value-to-object couplings, complementarity and inconsistency, and requiring large data, disentanglement, and high computational power. This work introduces a shallow but powerful UNsupervised heTerogeneous couplIng lEarning (UNTIE) approach for representing coupled categorical data by untying the interactions between couplings and revealing heterogeneous distributions embedded in each type of couplings. UNTIE is efficiently optimized w.r.t. a kernel k-means objective function for unsupervised representation learning of heterogeneous and hierarchical value-to-object couplings. Theoretical analysis shows that UNTIE can represent categorical data with maximal separability while effectively represents heterogeneous couplings and disclose their roles in categorical data. The UNTIE-learned representations make significant performance improvement against the state-of-the-art categorical representations and deep representation models on 25 categorical data sets with diversified characteristics
Deep Network Regularization with Representation Shaping
νμλ
Όλ¬Έ (λ°μ¬)-- μμΈλνκ΅ λνμ : μ΅ν©κ³ΌνκΈ°μ λνμ μ΅ν©κ³ΌνλΆ(λμ§νΈμ 보μ΅ν©μ 곡), 2019. 2. Rhee, Wonjong.The statistical characteristics of learned representations such as correlation and representational sparsity are known to be relevant to the performance of deep learning methods. Also, learning meaningful and useful data representations by using regularization methods has been one of the central concerns in deep learning. In this dissertation, deep network regularization using representation shaping are studied. Roughly, the following questions are answered: what are the common statistical characteristics of representations that high-performing networks share? Do the characteristics have a causal relationship with performance? To answer the questions, five representation regularizers are proposed: class-wise Covariance Regularizer (cw-CR), Variance Regularizer (VR), class-wise Variance Regularizer (cw-VR), Rank Regularizer (RR), and class-wise Rank Regularizer (cw-RR). Significant performance improvements were found for a variety of tasks over popular benchmark datasets with the regularizers. The visualization of learned representations shows that the regularizers used in this work indeed perform distinct representation shaping. Then, with a variety of representation regularizers, a few statistical characteristics of learned representations including covariance, correlation, sparsity, dead unit, and rank are investigated. Our theoretical analysis and experimental results indicate that all the statistical characteristics considered in this work fail to show any general or causal pattern for improving performance. Mutual information I(zx) and I(zy) are examined as well, and it is shown that regularizers can affect I(zx) and thus indirectly influence the performance. Finally, two practical ways of using representation regularizers are presented to address the usefulness of representation regularizers: using a set of representation regularizers as a performance tuning tool and enhancing network compression with representation regularizers.Chapter 1. Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2. Generalization, Regularization, and Representation in Deep Learning 8
2.1 Deep Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Capacity, Overfitting, and Generalization . . . . . . . . . . . 11
2.2.2 Generalization in Deep Learning . . . . . . . . . . . . . . . . 12
2.3 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Capacity Control and Regularization . . . . . . . . . . . . . . 14
2.3.2 Regularization for Deep Learning . . . . . . . . . . . . . . . 16
2.4 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1 Representation Learning . . . . . . . . . . . . . . . . . . . . 18
2.4.2 Representation Shaping . . . . . . . . . . . . . . . . . . . . 20
Chapter 3. Representation Regularizer Design with Class Information 26
3.1 Class-wise Representation Regularizers: cw-CR and cw-VR . . . . . 27
3.1.1 Basic Statistics of Representations . . . . . . . . . . . . . . . 27
3.1.2 cw-CR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.3 cw-VR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.4 Penalty Loss Functions and Gradients . . . . . . . . . . . . . 30
3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1 Image Classification Task . . . . . . . . . . . . . . . . . . . 33
3.2.2 Image Reconstruction Task . . . . . . . . . . . . . . . . . . . 36
3.3 Analysis of Representation Characteristics . . . . . . . . . . . . . . . 36
3.3.1 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.2 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Layer Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Chapter 4. Representation Characteristics and Their Relationship with Performance 42
4.1 Representation Characteristics . . . . . . . . . . . . . . . . . . . . . 43
4.2 Experimental Results of Representation Regularization . . . . . . . . 46
4.3 Scaling, Permutation, Covariance, and Correlation . . . . . . . . . . . 48
4.3.1 Identical Output Network (ION) . . . . . . . . . . . . . . . . 48
4.3.2 Possible Extensions for ION . . . . . . . . . . . . . . . . . . 51
4.4 Sparsity, Dead Unit, and Rank . . . . . . . . . . . . . . . . . . . . . 55
4.4.1 Analytical Relationship . . . . . . . . . . . . . . . . . . . . . 55
4.4.2 Rank Regularizer . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4.3 A Controlled Experiment on Data Generation Process . . . . 58
4.5 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Chapter 5. Practical Ways of Using Representation Regularizers 65
5.1 Tuning Deep Network Performance Using Representation Regularizers 65
5.1.1 Experimental Settings and Conditions . . . . . . . . . . . . . 66
5.1.2 Consistently Well-performing Regularizer . . . . . . . . . . . 67
5.1.3 Performance Improvement Using Regularizers as a Set . . . . 68
5.2 Enhancing Network Compression Using Representation Regularizers 68
5.2.1 The Need for Network Compression . . . . . . . . . . . . . . 72
5.2.2 Three Typical Approaches for Network Compression . . . . . 73
5.2.3 Proposed Approaches and Experimental Results . . . . . . . 74
Chapter 6. Discussion 79
6.1 Implication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.1.1 Usefulness of Class Information . . . . . . . . . . . . . . . . 79
6.1.2 Comparison with Non-penalty Regularizers: Dropout and Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.1.3 Identical Output Network . . . . . . . . . . . . . . . . . . . 82
6.1.4 Using Representation Regularizers for Performance Tuning . 82
6.1.5 Benefits and Drawbacks of Different Statistical Characteristics of Representations . . . . . . . . . . . . . . . . . . . . . . . 83
6.2 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2.1 Understanding the Underlying Mechanism of Representation Regularization . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2.2 Manipulating Representation Characteristics other than Covariance and Variance for ReLU Networks . . . . . . . . . . . . 86
6.2.3 Investigating Representation Characteristics of Complicated Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3 Possible Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3.1 Interpreting Learned Representations via Visualization . . . . 88
6.3.2 Designing a Regularizer Utilizing Mutual Information . . . . 89
6.3.3 Applying Multiple Representation Regularizers to a Network . 90
6.3.4 Enhancing Deep Network Compression via Representation Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Chapter 7. Conclusion 93
Bibliography 94
Appendix 103
A Principal Component Analysis of Learned Representations . . . . . . 104
B Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Acknowlegement 113Docto
Learning Purified Feature Representations from Task-irrelevant Labels
Learning an empirically effective model with generalization using limited
data is a challenging task for deep neural networks. In this paper, we propose
a novel learning framework called PurifiedLearning to exploit task-irrelevant
features extracted from task-irrelevant labels when training models on
small-scale datasets. Particularly, we purify feature representations by using
the expression of task-irrelevant information, thus facilitating the learning
process of classification. Our work is built on solid theoretical analysis and
extensive experiments, which demonstrate the effectiveness of PurifiedLearning.
According to the theory we proved, PurifiedLearning is model-agnostic and
doesn't have any restrictions on the model needed, so it can be combined with
any existing deep neural networks with ease to achieve better performance. The
source code of this paper will be available in the future for reproducibility.Comment: arXiv admin note: substantial text overlap with arXiv:2011.0847
The Neural Race Reduction: Dynamics of Abstraction in Gated Networks
Our theoretical understanding of deep learning has not kept pace with its empirical success. While network architecture is known to be critical, we do not yet understand its effect on learned representations and network behavior, or how this architecture should reflect task this http URL this work, we begin to address this gap by introducing the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics within an architecture. Crucially, because of the gating, these networks can compute nonlinear functions of their input. We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning. Our analysis demonstrates that the learning dynamics in structured networks can be conceptualized as a neural race with an implicit bias towards shared representations, which then govern the model's ability to systematically generalize, multi-task, and transfer. We validate our key insights on naturalistic datasets and with relaxed assumptions. Taken together, our work gives rise to general hypotheses relating neural architecture to learning and provides a mathematical approach towards understanding the design of more complex architectures and the role of modularity and compositionality in solving real-world problems. The code and results are available at this https URL
- β¦