9 research outputs found

    C3: Cross-instance guided Contrastive Clustering

    Full text link
    Clustering is the task of gathering similar data samples into clusters without using any predefined labels. It has been widely studied in machine learning literature, and recent advancements in deep learning have revived interest in this field. Contrastive clustering (CC) models are a staple of deep clustering in which positive and negative pairs of each data instance are generated through data augmentation. CC models aim to learn a feature space where instance-level and cluster-level representations of positive pairs are grouped together. Despite improving the SOTA, these algorithms ignore the cross-instance patterns, which carry essential information for improving clustering performance. This increases the false-negative-pair rate of the model while decreasing its true-positive-pair rate. In this paper, we propose a novel contrastive clustering method, Cross-instance guided Contrastive Clustering (C3), that considers the cross-sample relationships to increase the number of positive pairs and mitigate the impact of false negative, noise, and anomaly sample on the learned representation of data. In particular, we define a new loss function that identifies similar instances using the instance-level representation and encourages them to aggregate together. Moreover, we propose a novel weighting method to select negative samples in a more efficient way. Extensive experimental evaluations show that our proposed method can outperform state-of-the-art algorithms on benchmark computer vision datasets: we improve the clustering accuracy by 6.6%, 3.3%, 5.0%, 1.3% and 0.3% on CIFAR-10, CIFAR-100, ImageNet-10, ImageNet-Dogs, and Tiny-ImageNet.Comment: 10 pages, 8 Figures, 1 Table

    Contrastive Clustering

    Full text link
    In this paper, we propose a one-stage online clustering method called Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning. To be specific, for a given dataset, the positive and negative instance pairs are constructed through data augmentations and then projected into a feature space. Therein, the instance- and cluster-level contrastive learning are respectively conducted in the row and column space by maximizing the similarities of positive pairs while minimizing those of negative ones. Our key observation is that the rows of the feature matrix could be regarded as soft labels of instances, and accordingly the columns could be further regarded as cluster representations. By simultaneously optimizing the instance- and cluster-level contrastive loss, the model jointly learns representations and cluster assignments in an end-to-end manner. Extensive experimental results show that CC remarkably outperforms 17 competitive clustering methods on six challenging image benchmarks. In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19\% (39\%) performance improvement compared with the best baseline

    Information Maximization Clustering via Multi-View Self-Labelling

    Full text link
    Image clustering is a particularly challenging computer vision task, which aims to generate annotations without human supervision. Recent advances focus on the use of self-supervised learning strategies in image clustering, by first learning valuable semantics and then clustering the image representations. These multiple-phase algorithms, however, increase the computational time and their final performance is reliant on the first stage. By extending the self-supervised approach, we propose a novel single-phase clustering method that simultaneously learns meaningful representations and assigns the corresponding annotations. This is achieved by integrating a discrete representation into the self-supervised paradigm through a classifier net. Specifically, the proposed clustering objective employs mutual information, and maximizes the dependency between the integrated discrete representation and a discrete probability distribution. The discrete probability distribution is derived though the self-supervised process by comparing the learnt latent representation with a set of trainable prototypes. To enhance the learning performance of the classifier, we jointly apply the mutual information across multi-crop views. Our empirical results show that the proposed framework outperforms state-of-the-art techniques with the average accuracy of 89.1% and 49.0%, respectively, on CIFAR-10 and CIFAR-100/20 datasets. Finally, the proposed method also demonstrates attractive robustness to parameter settings, making it ready to be applicable to other datasets

    Temporal Action Segmentation: An Analysis of Modern Techniques

    Full text link
    Temporal action segmentation (TAS) in videos aims at densely identifying video frames in minutes-long videos with multiple action classes. As a long-range video understanding task, researchers have developed an extended collection of methods and examined their performance using various benchmarks. Despite the rapid growth of TAS techniques in recent years, no systematic survey has been conducted in these sectors. This survey analyzes and summarizes the most significant contributions and trends. In particular, we first examine the task definition, common benchmarks, types of supervision, and prevalent evaluation measures. In addition, we systematically investigate two essential techniques of this topic, i.e., frame representation and temporal modeling, which have been studied extensively in the literature. We then conduct a thorough review of existing TAS works categorized by their levels of supervision and conclude our survey by identifying and emphasizing several research gaps. In addition, we have curated a list of TAS resources, which is available at https://github.com/nus-cvml/awesome-temporal-action-segmentation.Comment: 19 pages, 9 figures, 8 table

    Deep Clustering: A Comprehensive Survey

    Full text link
    Cluster analysis plays an indispensable role in machine learning and data mining. Learning a good data representation is crucial for clustering algorithms. Recently, deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering. To address this issue, in this paper we provide a comprehensive survey for deep clustering in views of data sources. With different data sources and initial conditions, we systematically distinguish the clustering methods in terms of methodology, prior knowledge, and architecture. Concretely, deep clustering methods are introduced according to four categories, i.e., traditional single-view deep clustering, semi-supervised deep clustering, deep multi-view clustering, and deep transfer clustering. Finally, we discuss the open challenges and potential future opportunities in different fields of deep clustering

    동쒅, 이쒅, 그리고 λ‚˜λ¬΄ ν˜•νƒœμ˜ κ·Έλž˜ν”„λ₯Ό μœ„ν•œ 비지도 ν‘œν˜„ ν•™μŠ΅

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사) -- μ„œμšΈλŒ€ν•™κ΅λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·정보곡학뢀, 2022. 8. μ΅œμ§„μ˜.κ·Έλž˜ν”„ 데이터에 λŒ€ν•œ 비지도 ν‘œν˜„ ν•™μŠ΅μ˜ λͺ©μ μ€ κ·Έλž˜ν”„μ˜ ꡬ쑰와 λ…Έλ“œμ˜ 속성을 잘 λ°˜μ˜ν•˜λŠ” μœ μš©ν•œ λ…Έλ“œ λ‹¨μœ„ ν˜Ήμ€ κ·Έλž˜ν”„ λ‹¨μœ„μ˜ 벑터 ν˜•νƒœ ν‘œν˜„μ„ ν•™μŠ΅ν•˜λŠ” 것이닀. 졜근, κ·Έλž˜ν”„ 데이터에 λŒ€ν•΄ κ°•λ ₯ν•œ ν‘œν˜„ ν•™μŠ΅ λŠ₯λ ₯을 κ°–μΆ˜ κ·Έλž˜ν”„ 신경망을 ν™œμš©ν•œ 비지도 κ·Έλž˜ν”„ ν‘œν˜„ ν•™μŠ΅ λͺ¨λΈμ˜ 섀계가 μ£Όλͺ©μ„ λ°›κ³  μžˆλ‹€. λ§Žμ€ 방법듀은 ν•œ μ’…λ₯˜μ˜ 엣지와 ν•œ μ’…λ₯˜μ˜ λ…Έλ“œκ°€ μ‘΄μž¬ν•˜λŠ” 동쒅 κ·Έλž˜ν”„μ— λŒ€ν•œ ν•™μŠ΅μ— 집쀑을 ν•œλ‹€. ν•˜μ§€λ§Œ 이 세상에 μˆ˜λ§Žμ€ μ’…λ₯˜μ˜ 관계가 μ‘΄μž¬ν•˜κΈ° λ•Œλ¬Έμ—, κ·Έλž˜ν”„ λ˜ν•œ ꡬ쑰적, 의미둠적 속성을 톡해 λ‹€μ–‘ν•œ μ’…λ₯˜λ‘œ λΆ„λ₯˜ν•  수 μžˆλ‹€. κ·Έλž˜μ„œ, κ·Έλž˜ν”„λ‘œλΆ€ν„° μœ μš©ν•œ ν‘œν˜„μ„ ν•™μŠ΅ν•˜κΈ° μœ„ν•΄μ„œλŠ” 비지도 ν•™μŠ΅ ν”„λ ˆμž„μ›Œν¬λŠ” μž…λ ₯ κ·Έλž˜ν”„μ˜ νŠΉμ§•μ„ μ œλŒ€λ‘œ κ³ λ €ν•΄μ•Όλ§Œ ν•œλ‹€. λ³Έ ν•™μœ„λ…Όλ¬Έμ—μ„œ μš°λ¦¬λŠ” 널리 μ ‘ν•  수 μžˆλŠ” 세가지 κ·Έλž˜ν”„ ꡬ쑰인 동쒅 κ·Έλž˜ν”„, 트리 ν˜•νƒœμ˜ κ·Έλž˜ν”„, 그리고 이쒅 κ·Έλž˜ν”„μ— λŒ€ν•œ κ·Έλž˜ν”„ 신경망을 ν™œμš©ν•˜λŠ” 비지도 ν•™μŠ΅ λͺ¨λΈλ“€μ„ μ œμ•ˆν•œλ‹€. 처음으둜, μš°λ¦¬λŠ” 동쒅 κ·Έλž˜ν”„μ˜ λ…Έλ“œμ— λŒ€ν•˜μ—¬ 저차원 ν‘œν˜„μ„ ν•™μŠ΅ν•˜λŠ” κ·Έλž˜ν”„ μ»¨λ³Όλ£¨μ…˜ μ˜€ν† μΈμ½”λ” λͺ¨λΈμ„ μ œμ•ˆν•œλ‹€. 기쑴의 κ·Έλž˜ν”„ μ˜€ν† μΈμ½”λ”λŠ” ꡬ쑰의 전체가 ν•™μŠ΅μ΄ λΆˆκ°€λŠ₯ν•΄μ„œ μ œν•œμ μΈ ν‘œν˜„ ν•™μŠ΅ λŠ₯λ ₯을 κ°€μ§ˆ 수 μžˆλŠ” λ°˜λ©΄μ—, μ œμ•ˆν•˜λŠ” μ˜€ν† μΈμ½”λ”λŠ” λ…Έλ“œμ˜ 피쳐λ₯Ό λ³΅μ›ν•˜λ©°,ꡬ쑰의 전체가 ν•™μŠ΅μ΄ κ°€λŠ₯ν•˜λ‹€. λ…Έλ“œμ˜ 피쳐λ₯Ό λ³΅μ›ν•˜κΈ° μœ„ν•΄μ„œ, μš°λ¦¬λŠ” 인코더 λΆ€λΆ„μ˜ 역할이 μ΄μ›ƒν•œ λ…Έλ“œλΌλ¦¬ μœ μ‚¬ν•œ ν‘œν˜„μ„ κ°€μ§€κ²Œ ν•˜λŠ” λΌν”ŒλΌμ‹œμ•ˆ μŠ€λ¬΄λ”©μ΄λΌλŠ” 것에 μ£Όλͺ©ν•˜μ—¬ 디코더 λΆ€λΆ„μ—μ„œλŠ” 이웃 λ…Έλ“œμ˜ ν‘œν˜„κ³Ό λ©€μ–΄μ§€κ²Œ ν•˜λŠ” λΌν”ŒλΌμ‹œμ•ˆ 샀프닝을 ν•˜λ„λ‘ μ„€κ³„ν•˜μ˜€λ‹€. λ˜ν•œ λΌν”ŒλΌμ‹œμ•ˆ 샀프닝을 κ·ΈλŒ€λ‘œ μ μš©ν•˜λ©΄ λΆˆμ•ˆμ •μ„±μ„ μœ λ°œν•  수 있기 λ•Œλ¬Έμ—, μ—£μ§€μ˜ κ°€μ€‘μΉ˜ 값에 음의 값을 쀄 수 μžˆλŠ” λΆ€ν˜Έν˜• κ·Έλž˜ν”„λ₯Ό ν™œμš©ν•˜μ—¬ μ•ˆμ •μ μΈ λΌν”ŒλΌμ‹œμ•ˆ μƒ€ν”„λ‹μ˜ ν˜•νƒœλ₯Ό μ œμ•ˆν•˜μ˜€λ‹€. 동쒅 κ·Έλž˜ν”„μ— λŒ€ν•œ λ…Έλ“œ ν΄λŸ¬μŠ€ν„°λ§κ³Ό 링크 예츑 μ‹€ν—˜μ„ ν†΅ν•˜μ—¬ μ œμ•ˆν•˜λŠ” 방법이 μ•ˆμ •μ μœΌλ‘œ μš°μˆ˜ν•œ μ„±λŠ₯을 λ³΄μž„μ„ ν™•μΈν•˜μ˜€λ‹€. λ‘˜μ§Έλ‘œ, μš°λ¦¬λŠ” 트리의 ν˜•νƒœλ₯Ό κ°€μ§€λŠ” 계측적인 관계λ₯Ό 가지고 μžˆλŠ” κ·Έλž˜ν”„μ˜ λ…Έλ“œ ν‘œν˜„μ„ μ •ν™•ν•˜κ²Œ ν•™μŠ΅ν•˜κΈ° μœ„ν•˜μ—¬ μŒκ³‘μ„  κ³΅κ°„μ—μ„œ λ™μž‘ν•˜λŠ” μ˜€ν† μΈμ½”λ” λͺ¨λΈμ„ μ œμ•ˆν•œλ‹€. μœ ν΄λ¦¬λ””μ–Έ 곡간은 트리λ₯Ό μ‚¬μƒν•˜κΈ°μ— λΆ€μ μ ˆν•˜λ‹€λŠ” 졜근의 뢄석을 ν†΅ν•˜μ—¬, μŒκ³‘μ„  κ³΅κ°„μ—μ„œ κ·Έλž˜ν”„ μ‹ κ²½λ§μ˜ λ ˆμ΄μ–΄λ₯Ό ν™œμš©ν•˜μ—¬ λ…Έλ“œμ˜ 저차원 ν‘œν˜„μ„ ν•™μŠ΅ν•˜κ²Œ λœλ‹€. 이 λ•Œ, κ·Έλž˜ν”„ 신경망이 μŒκ³‘μ„  κΈ°ν•˜ν•™μ—μ„œ 계측 정보λ₯Ό λ‹΄κ³  μžˆλŠ” 거리의 값을 ν™œμš©ν•˜μ—¬ λ…Έλ“œμ˜ μ΄μ›ƒμ‚¬μ΄μ˜ μ€‘μš”λ„λ₯Ό ν™œμš©ν•˜λ„λ‘ μ„€κ³„ν•˜μ˜€λ‹€. μš°λ¦¬λŠ” λ…Όλ¬Έ 인용 관계 λ„€νŠΈμ›Œν¬, 계톡도, 이미지 μ‚¬μ΄μ˜ λ„€νŠΈμ›Œν¬λ“±μ— λŒ€ν•΄ μ œμ•ˆν•œ λͺ¨λΈμ„ μ μš©ν•˜μ—¬ λ…Έλ“œ ν΄λŸ¬μŠ€ν„°λ§κ³Ό 링크 예츑 μ‹€ν—˜μ„ ν•˜μ˜€μœΌλ©°, 트리의 ν˜•νƒœλ₯Ό κ°€μ§€λŠ” κ·Έλž˜ν”„μ— λŒ€ν•΄μ„œ μ œμ•ˆν•œ λͺ¨λΈμ΄ μœ ν΄λ¦¬λ””μ–Έ κ³΅κ°„μ—μ„œ μˆ˜ν–‰ν•˜λŠ” λͺ¨λΈμ— λΉ„ν•΄ ν–₯μƒλœ μ„±λŠ₯을 λ³΄μ˜€λ‹€λŠ” 것을 ν™•μΈν•˜μ˜€λ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μš°λ¦¬λŠ” μ—¬λŸ¬ μ’…λ₯˜μ˜ λ…Έλ“œμ™€ 엣지λ₯Ό κ°€μ§€λŠ” μ΄μ’…κ·Έλž˜ν”„μ— λŒ€ν•œ λŒ€μ‘° ν•™μŠ΅ λͺ¨λΈμ„ μ œμ•ˆν•œλ‹€. μš°λ¦¬λŠ” 기쑴의 방법듀이 ν•™μŠ΅ν•˜κΈ° 이전에 μΆ©λΆ„ν•œ 도메인 지식을 μ‚¬μš©ν•˜μ—¬ μ„€κ³„ν•œ λ©”νƒ€νŒ¨μŠ€λ‚˜ λ©”νƒ€κ·Έλž˜ν”„μ— μ˜μ‘΄ν•œλ‹€λŠ” 단점과 λ§Žμ€ μ΄μ’…κ·Έλž˜ν”„μ˜ 엣지가 λ‹€λ₯Έ λ…Έλ“œ μ’…λ₯˜μ‚¬μ΄μ˜ 관계에 μ§‘μ€‘ν•˜κ³  μžˆλ‹€λŠ” 점을 μ£Όλͺ©ν•˜μ˜€λ‹€. 이λ₯Ό 톡해 μš°λ¦¬λŠ” 사전과정이 ν•„μš”μ—†μœΌλ©° λ‹€λ₯Έ μ’…λ₯˜ μ‚¬μ΄μ˜ 관계에 λ”ν•˜μ—¬ 같은 μ’…λ₯˜ μ‚¬μ΄μ˜ 관계도 λ™μ‹œμ— 효율적으둜 ν•™μŠ΅ν•˜κ²Œ ν•˜λŠ” λ©”νƒ€λ…Έλ“œλΌλŠ” κ°œλ…μ„ μ œμ•ˆν•˜μ˜€λ‹€. λ˜ν•œ λ©”νƒ€λ…Έλ“œλ₯Ό κΈ°λ°˜μœΌλ‘œν•˜λŠ” κ·Έλž˜ν”„ 신경망과 λŒ€μ‘° ν•™μŠ΅ λͺ¨λΈμ„ μ œμ•ˆν•˜μ˜€λ‹€. μš°λ¦¬λŠ” μ œμ•ˆν•œ λͺ¨λΈμ„ λ©”νƒ€νŒ¨μŠ€λ₯Ό μ‚¬μš©ν•˜λŠ” μ΄μ’…κ·Έλž˜ν”„ ν•™μŠ΅ λͺ¨λΈκ³Ό λ…Έλ“œ ν΄λŸ¬μŠ€ν„°λ§ λ“±μ˜ μ‹€ν—˜ μ„±λŠ₯으둜 λΉ„κ΅ν•΄λ³΄μ•˜μ„ λ•Œ, λΉ„λ“±ν•˜κ±°λ‚˜ 높은 μ„±λŠ₯을 λ³΄μ˜€μŒμ„ ν™•μΈν•˜μ˜€λ‹€.The goal of unsupervised graph representation learning is extracting useful node-wise or graph-wise vector representation that is aware of the intrinsic structures of the graph and its attributes. These days, designing methodology of unsupervised graph representation learning based on graph neural networks has growing attention due to their powerful representation ability. Many methods are focused on a homogeneous graph that is a network with a single type of node and a single type of edge. However, as many types of relationships exist in this world, graphs can also be classified into various types by structural and semantic properties. For this reason, to learn useful representations from graphs, the unsupervised learning framework must consider the characteristics of the input graph. In this dissertation, we focus on designing unsupervised learning models using graph neural networks for three graph structures that are widely available: homogeneous graphs, tree-like graphs, and heterogeneous graphs. First, we propose a symmetric graph convolutional autoencoder which produces a low-dimensional latent representation from a homogeneous graph. In contrast to the existing graph autoencoders with asymmetric decoder parts, the proposed autoencoder has a newly designed decoder which builds a completely symmetric autoencoder form. For the reconstruction of node features, the decoder is designed based on Laplacian sharpening as the counterpart of Laplacian smoothing of the encoder, which allows utilizing the graph structure in the whole processes of the proposed autoencoder architecture. In order to prevent the numerical instability of the network caused by the Laplacian sharpening introduction, we further propose a new numerically stable form of the Laplacian sharpening by incorporating the signed graphs. The experimental results of clustering, link prediction and visualization tasks on homogeneous graphs strongly support that the proposed model is stable and outperforms various state-of-the-art algorithms. Second, we analyze how unsupervised tasks can benefit from learned representations in hyperbolic space. To explore how well the hierarchical structure of unlabeled data can be represented in hyperbolic spaces, we design a novel hyperbolic message passing autoencoder whose overall auto-encoding is performed in hyperbolic space. The proposed model conducts auto-encoding the networks via fully utilizing hyperbolic geometry in message passing. Through extensive quantitative and qualitative analyses, we validate the properties and benefits of the unsupervised hyperbolic representations of tree-like graphs. Third, we propose the novel concept of metanode for message passing to learn both heterogeneous and homogeneous relationships between any two nodes without meta-paths and meta-graphs. Unlike conventional methods, metanodes do not require a predetermined step to manipulate the given relations between different types to enrich relational information. Going one step further, we propose a metanode-based message passing layer and a contrastive learning model using the proposed layer. In our experiments, we show the competitive performance of the proposed metanode-based message passing method on node clustering and node classification tasks, when compared to state-of-the-art methods for message passing networks for heterogeneous graphs.1 Introduction 1 2 Representation Learning on Graph-Structured Data 4 2.1 Basic Introduction 4 2.1.1 Notations 5 2.2 Traditional Approaches 5 2.2.1 Graph Statistic 5 2.2.2 Neighborhood Overlap 7 2.2.3 Graph Kernel 9 2.2.4 Spectral Approaches 10 2.3 Node Embeddings I: Factorization and Random Walks 15 2.3.1 Factorization-based Methods 15 2.3.2 Random Walk-based Methods 16 2.4 Node Embeddings II: Graph Neural Networks 17 2.4.1 Overview of Framework 17 2.4.2 Representative Models 18 2.5 Learning in Unsupervised Environments 21 2.5.1 Predictive Coding 21 2.5.2 Contrastive Coding 22 2.6 Applications 24 2.6.1 Classifications 24 2.6.2 Link Prediction 26 3 Autoencoder Architecture for Homogeneous Graphs 27 3.1 Overview 27 3.2 Preliminaries 30 3.2.1 Spectral Convolution on Graphs 30 3.2.2 Laplacian Smoothing 32 3.3 Methodology 33 3.3.1 Laplacian Sharpening 33 3.3.2 Numerically Stable Laplacian Sharpening 34 3.3.3 Subspace Clustering Cost for Image Clustering 37 3.3.4 Training 39 3.4 Experiments 40 3.4.1 Datasets 40 3.4.2 Experimental Settings 42 3.4.3 Comparing Methods 42 3.4.4 Node Clustering 43 3.4.5 Image Clustering 45 3.4.6 Ablation Studies 46 3.4.7 Link Prediction 47 3.4.8 Visualization 47 3.5 Summary 49 4 Autoencoder Architecture for Tree-like Graphs 50 4.1 Overview 50 4.2 Preliminaries 52 4.2.1 Hyperbolic Embeddings 52 4.2.2 Hyperbolic Geometry 53 4.3 Methodology 55 4.3.1 Geometry-Aware Message Passing 56 4.3.2 Nonlinear Activation 57 4.3.3 Loss Function 58 4.4 Experiments 58 4.4.1 Datasets 59 4.4.2 Compared Methods 61 4.4.3 Experimental Details 62 4.4.4 Node Clustering and Link Prediction 64 4.4.5 Image Clustering 66 4.4.6 Structure-Aware Unsupervised Embeddings 68 4.4.7 Hyperbolic Distance to Filter Training Samples 71 4.4.8 Ablation Studies 74 4.5 Further Discussions 75 4.5.1 Connection to Contrastive Learning 75 4.5.2 Failure Cases of Hyperbolic Embedding Spaces 75 4.6 Summary 77 5 Contrastive Learning for Heterogeneous Graphs 78 5.1 Overview 78 5.2 Preliminaries 82 5.2.1 Meta-path 82 5.2.2 Representation Learning on Heterogeneous Graphs 82 5.2.3 Contrastive methods for Heterogeneous Graphs 83 5.3 Methodology 84 5.3.1 Definitions 84 5.3.2 Metanode-based Message Passing Layer 86 5.3.3 Contrastive Learning Framework 88 5.4 Experiments 89 5.4.1 Experimental Details 90 5.4.2 Node Classification 94 5.4.3 Node Clustering 96 5.4.4 Visualization 96 5.4.5 Effectiveness of Metanodes 97 5.5 Summary 99 6 Conclusions 101λ°•
    corecore