219,519 research outputs found

    Representation Learning with Fine-grained Patterns

    Full text link
    With the development of computational power and techniques for data collection, deep learning demonstrates a superior performance over most of existing algorithms on benchmark data sets. Many efforts have been devoted to studying the mechanism of deep learning. One important observation is that deep learning can learn the discriminative patterns from raw materials directly in a task-dependent manner. Therefore, the representations obtained by deep learning outperform hand-crafted features significantly. However, those patterns are often learned from super-class labels due to a limited availability of fine-grained labels, while fine-grained patterns are desired in many real-world applications such as visual search in online shopping. To mitigate the challenge, we propose an algorithm to learn the fine-grained patterns sufficiently when only super-class labels are available. The effectiveness of our method can be guaranteed with the theoretical analysis. Extensive experiments on real-world data sets demonstrate that the proposed method can significantly improve the performance on target tasks corresponding to fine-grained classes, when only super-class information is available for training

    On the Informativeness of Supervision Signals

    Full text link
    Learning transferable representations by training a classifier is a well-established technique in deep learning (e.g., ImageNet pretraining), but it remains an open theoretical question why this kind of task-specific pre-training should result in ''good'' representations that actually capture the underlying structure of the data. We conduct an information-theoretic analysis of several commonly-used supervision signals from contrastive learning and classification to determine how they contribute to representation learning performance and how the dynamics of learning are affected by training parameters such as the number of labels, classes, and dimensions in the training dataset. We validate these results empirically in a series of simulations and conduct a cost-benefit analysis to establish a tradeoff curve that enables users to optimize the cost of supervising representation learning on their own datasets

    Deep Multiview Clustering by Contrasting Cluster Assignments

    Full text link
    Multiview clustering (MVC) aims to reveal the underlying structure of multiview data by categorizing data samples into clusters. Deep learning-based methods exhibit strong feature learning capabilities on large-scale datasets. For most existing deep MVC methods, exploring the invariant representations of multiple views is still an intractable problem. In this paper, we propose a cross-view contrastive learning (CVCL) method that learns view-invariant representations and produces clustering results by contrasting the cluster assignments among multiple views. Specifically, we first employ deep autoencoders to extract view-dependent features in the pretraining stage. Then, a cluster-level CVCL strategy is presented to explore consistent semantic label information among the multiple views in the fine-tuning stage. Thus, the proposed CVCL method is able to produce more discriminative cluster assignments by virtue of this learning strategy. Moreover, we provide a theoretical analysis of soft cluster assignment alignment. Extensive experimental results obtained on several datasets demonstrate that the proposed CVCL method outperforms several state-of-the-art approaches.Comment: 10 pages, 7 figure

    Unsupervised Heterogeneous Coupling Learning for Categorical Representation.

    Full text link
    Complex categorical data is often hierarchically coupled with heterogeneous relationships between attributes and attribute values and the couplings between objects. Such value-to-object couplings are heterogeneous with complementary and inconsistent interactions and distributions. Limited research exists on unlabeled categorical data representations, ignores the heterogeneous and hierarchical couplings, underestimates data characteristics and complexities, and overuses redundant information, etc. Deep representation learning of unlabeled categorical data is challenging, overseeing such value-to-object couplings, complementarity and inconsistency, and requiring large data, disentanglement, and high computational power. This work introduces a shallow but powerful UNsupervised heTerogeneous couplIng lEarning (UNTIE) approach for representing coupled categorical data by untying the interactions between couplings and revealing heterogeneous distributions embedded in each type of couplings. UNTIE is efficiently optimized w.r.t. a kernel k-means objective function for unsupervised representation learning of heterogeneous and hierarchical value-to-object couplings. Theoretical analysis shows that UNTIE can represent categorical data with maximal separability while effectively represents heterogeneous couplings and disclose their roles in categorical data. The UNTIE-learned representations make significant performance improvement against the state-of-the-art categorical representations and deep representation models on 25 categorical data sets with diversified characteristics

    Deep Network Regularization with Representation Shaping

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : μœ΅ν•©κ³Όν•™κΈ°μˆ λŒ€ν•™μ› μœ΅ν•©κ³Όν•™λΆ€(λ””μ§€ν„Έμ •λ³΄μœ΅ν•©μ „κ³΅), 2019. 2. Rhee, Wonjong.The statistical characteristics of learned representations such as correlation and representational sparsity are known to be relevant to the performance of deep learning methods. Also, learning meaningful and useful data representations by using regularization methods has been one of the central concerns in deep learning. In this dissertation, deep network regularization using representation shaping are studied. Roughly, the following questions are answered: what are the common statistical characteristics of representations that high-performing networks share? Do the characteristics have a causal relationship with performance? To answer the questions, five representation regularizers are proposed: class-wise Covariance Regularizer (cw-CR), Variance Regularizer (VR), class-wise Variance Regularizer (cw-VR), Rank Regularizer (RR), and class-wise Rank Regularizer (cw-RR). Significant performance improvements were found for a variety of tasks over popular benchmark datasets with the regularizers. The visualization of learned representations shows that the regularizers used in this work indeed perform distinct representation shaping. Then, with a variety of representation regularizers, a few statistical characteristics of learned representations including covariance, correlation, sparsity, dead unit, and rank are investigated. Our theoretical analysis and experimental results indicate that all the statistical characteristics considered in this work fail to show any general or causal pattern for improving performance. Mutual information I(zx) and I(zy) are examined as well, and it is shown that regularizers can affect I(zx) and thus indirectly influence the performance. Finally, two practical ways of using representation regularizers are presented to address the usefulness of representation regularizers: using a set of representation regularizers as a performance tuning tool and enhancing network compression with representation regularizers.Chapter 1. Introduction 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2. Generalization, Regularization, and Representation in Deep Learning 8 2.1 Deep Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Capacity, Overfitting, and Generalization . . . . . . . . . . . 11 2.2.2 Generalization in Deep Learning . . . . . . . . . . . . . . . . 12 2.3 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Capacity Control and Regularization . . . . . . . . . . . . . . 14 2.3.2 Regularization for Deep Learning . . . . . . . . . . . . . . . 16 2.4 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.1 Representation Learning . . . . . . . . . . . . . . . . . . . . 18 2.4.2 Representation Shaping . . . . . . . . . . . . . . . . . . . . 20 Chapter 3. Representation Regularizer Design with Class Information 26 3.1 Class-wise Representation Regularizers: cw-CR and cw-VR . . . . . 27 3.1.1 Basic Statistics of Representations . . . . . . . . . . . . . . . 27 3.1.2 cw-CR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.3 cw-VR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.4 Penalty Loss Functions and Gradients . . . . . . . . . . . . . 30 3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.1 Image Classification Task . . . . . . . . . . . . . . . . . . . 33 3.2.2 Image Reconstruction Task . . . . . . . . . . . . . . . . . . . 36 3.3 Analysis of Representation Characteristics . . . . . . . . . . . . . . . 36 3.3.1 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3.2 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Layer Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 4. Representation Characteristics and Their Relationship with Performance 42 4.1 Representation Characteristics . . . . . . . . . . . . . . . . . . . . . 43 4.2 Experimental Results of Representation Regularization . . . . . . . . 46 4.3 Scaling, Permutation, Covariance, and Correlation . . . . . . . . . . . 48 4.3.1 Identical Output Network (ION) . . . . . . . . . . . . . . . . 48 4.3.2 Possible Extensions for ION . . . . . . . . . . . . . . . . . . 51 4.4 Sparsity, Dead Unit, and Rank . . . . . . . . . . . . . . . . . . . . . 55 4.4.1 Analytical Relationship . . . . . . . . . . . . . . . . . . . . . 55 4.4.2 Rank Regularizer . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4.3 A Controlled Experiment on Data Generation Process . . . . 58 4.5 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Chapter 5. Practical Ways of Using Representation Regularizers 65 5.1 Tuning Deep Network Performance Using Representation Regularizers 65 5.1.1 Experimental Settings and Conditions . . . . . . . . . . . . . 66 5.1.2 Consistently Well-performing Regularizer . . . . . . . . . . . 67 5.1.3 Performance Improvement Using Regularizers as a Set . . . . 68 5.2 Enhancing Network Compression Using Representation Regularizers 68 5.2.1 The Need for Network Compression . . . . . . . . . . . . . . 72 5.2.2 Three Typical Approaches for Network Compression . . . . . 73 5.2.3 Proposed Approaches and Experimental Results . . . . . . . 74 Chapter 6. Discussion 79 6.1 Implication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.1.1 Usefulness of Class Information . . . . . . . . . . . . . . . . 79 6.1.2 Comparison with Non-penalty Regularizers: Dropout and Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.1.3 Identical Output Network . . . . . . . . . . . . . . . . . . . 82 6.1.4 Using Representation Regularizers for Performance Tuning . 82 6.1.5 Benefits and Drawbacks of Different Statistical Characteristics of Representations . . . . . . . . . . . . . . . . . . . . . . . 83 6.2 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2.1 Understanding the Underlying Mechanism of Representation Regularization . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2.2 Manipulating Representation Characteristics other than Covariance and Variance for ReLU Networks . . . . . . . . . . . . 86 6.2.3 Investigating Representation Characteristics of Complicated Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.3 Possible Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.3.1 Interpreting Learned Representations via Visualization . . . . 88 6.3.2 Designing a Regularizer Utilizing Mutual Information . . . . 89 6.3.3 Applying Multiple Representation Regularizers to a Network . 90 6.3.4 Enhancing Deep Network Compression via Representation Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Chapter 7. Conclusion 93 Bibliography 94 Appendix 103 A Principal Component Analysis of Learned Representations . . . . . . 104 B Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Acknowlegement 113Docto

    Learning Purified Feature Representations from Task-irrelevant Labels

    Full text link
    Learning an empirically effective model with generalization using limited data is a challenging task for deep neural networks. In this paper, we propose a novel learning framework called PurifiedLearning to exploit task-irrelevant features extracted from task-irrelevant labels when training models on small-scale datasets. Particularly, we purify feature representations by using the expression of task-irrelevant information, thus facilitating the learning process of classification. Our work is built on solid theoretical analysis and extensive experiments, which demonstrate the effectiveness of PurifiedLearning. According to the theory we proved, PurifiedLearning is model-agnostic and doesn't have any restrictions on the model needed, so it can be combined with any existing deep neural networks with ease to achieve better performance. The source code of this paper will be available in the future for reproducibility.Comment: arXiv admin note: substantial text overlap with arXiv:2011.0847

    The Neural Race Reduction: Dynamics of Abstraction in Gated Networks

    Get PDF
    Our theoretical understanding of deep learning has not kept pace with its empirical success. While network architecture is known to be critical, we do not yet understand its effect on learned representations and network behavior, or how this architecture should reflect task this http URL this work, we begin to address this gap by introducing the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics within an architecture. Crucially, because of the gating, these networks can compute nonlinear functions of their input. We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning. Our analysis demonstrates that the learning dynamics in structured networks can be conceptualized as a neural race with an implicit bias towards shared representations, which then govern the model's ability to systematically generalize, multi-task, and transfer. We validate our key insights on naturalistic datasets and with relaxed assumptions. Taken together, our work gives rise to general hypotheses relating neural architecture to learning and provides a mathematical approach towards understanding the design of more complex architectures and the role of modularity and compositionality in solving real-world problems. The code and results are available at this https URL
    • …
    corecore