1,375 research outputs found

    Progressive Cross-camera Soft-label Learning for Semi-supervised Person Re-identification

    Full text link
    In this paper, we focus on the semi-supervised person re-identification (Re-ID) case, which only has the intra-camera (within-camera) labels but not inter-camera (cross-camera) labels. In real-world applications, these intra-camera labels can be readily captured by tracking algorithms or few manual annotations, when compared with cross-camera labels. In this case, it is very difficult to explore the relationships between cross-camera persons in the training stage due to the lack of cross-camera label information. To deal with this issue, we propose a novel Progressive Cross-camera Soft-label Learning (PCSL) framework for the semi-supervised person Re-ID task, which can generate cross-camera soft-labels and utilize them to optimize the network. Concretely, we calculate an affinity matrix based on person-level features and adapt them to produce the similarities between cross-camera persons (i.e., cross-camera soft-labels). To exploit these soft-labels to train the network, we investigate the weighted cross-entropy loss and the weighted triplet loss from the classification and discrimination perspectives, respectively. Particularly, the proposed framework alternately generates progressive cross-camera soft-labels and gradually improves feature representations in the whole learning course. Extensive experiments on five large-scale benchmark datasets show that PCSL significantly outperforms the state-of-the-art unsupervised methods that employ labeled source domains or the images generated by the GAN-based models. Furthermore, the proposed method even has a competitive performance with respect to deep supervised Re-ID methods.Comment: Accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT

    Semi-Supervised and Unsupervised Deep Visual Learning: A Survey

    Get PDF
    State-of-the-art deep learning models are often trained with a large amountof costly labeled training data. However, requiring exhaustive manualannotations may degrade the model's generalizability in the limited-labelregime. Semi-supervised learning and unsupervised learning offer promisingparadigms to learn from an abundance of unlabeled visual data. Recent progressin these paradigms has indicated the strong benefits of leveraging unlabeleddata to improve model generalization and provide better model initialization.In this survey, we review the recent advanced deep learning algorithms onsemi-supervised learning (SSL) and unsupervised learning (UL) for visualrecognition from a unified perspective. To offer a holistic understanding ofthe state-of-the-art in these areas, we propose a unified taxonomy. Wecategorize existing representative SSL and UL with comprehensive and insightfulanalysis to highlight their design rationales in different learning scenariosand applications in different computer vision tasks. Lastly, we discuss theemerging trends and open challenges in SSL and UL to shed light on futurecritical research directions.<br

    A Survey on Metric Learning for Feature Vectors and Structured Data

    Full text link
    The need for appropriate ways to measure the distance or similarity between data is ubiquitous in machine learning, pattern recognition and data mining, but handcrafting such good metrics for specific problems is generally difficult. This has led to the emergence of metric learning, which aims at automatically learning a metric from data and has attracted a lot of interest in machine learning and related fields for the past ten years. This survey paper proposes a systematic review of the metric learning literature, highlighting the pros and cons of each approach. We pay particular attention to Mahalanobis distance metric learning, a well-studied and successful framework, but additionally present a wide range of methods that have recently emerged as powerful alternatives, including nonlinear metric learning, similarity learning and local metric learning. Recent trends and extensions, such as semi-supervised metric learning, metric learning for histogram data and the derivation of generalization guarantees, are also covered. Finally, this survey addresses metric learning for structured data, in particular edit distance learning, and attempts to give an overview of the remaining challenges in metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new method

    Visual Learning in Limited-Label Regime.

    Get PDF
    PhD ThesesAbstract Deep learning algorithms and architectures have greatly advanced the state-of-the-art in a wide variety of computer vision tasks, such as object recognition and image retrieval. To achieve human- or even super-human-level performance in most visual recognition tasks, large collections of labelled data are generally required to formulate meaningful supervision signals for model training. The standard supervised learning paradigm, however, is undesired in several perspectives. First, constructing large-scale labelled datasets not only requires exhaustive manual annotation efforts, but may also be legally prohibited. Second, deep neural networks trained with full label supervision upon a limited amount of labelled data are weak at generalising to new unseen data captured from a different data distribution. This thesis targets at solving the critical problem of lacking sufficient label annotations in deep learning. More specifically, we investigate four different deep learning paradigms in limited-label regime, including close-set semisupervised learning, open-set semi-supervised learning, open-set cross-domain learning, and unsupervised learning. The former two paradigms are explored in visual classification, which aims to recognise different categories in the images; while the latter two paradigms are studied in visual search – particularly in person re-identification – which targets at discriminating different but similar persons in a finer-grained manner and can be extended to the discrimination of other objects of high visual similarities. We detail our studies of these paradigms as follows. Chapter 3: Close-Set Semi-Supervised Learning (Figure 1 (I)) is a fundamental semi-supervised learning paradigm that aims to learn from a small set of labelled data and a large set of unlabelled data, where the two sets are assumed to lie in the same label space. To address this problem, existing semi-supervised deep learning methods often rely on the up-to-date “network-in-training” to formulate the semi-supervised learning objective, which ignores both the disriminative feature representation and the model inference uncertainty revealed by the network in the preceding learning iterations, referred to as the memory of model learning. In this work, we proposed to augment the deep neural network with a lightweight memory mechanism [Chen et al., 2018b], which captures the underlying manifold structure of the labelled data at the per-class level, and further imposes auxiliary unsupervised constraints to fit the unlabelled data towards the underlying manifolds. This work established a simple yet efficient close-set semi-supervised deep learning scheme to boost model generalisation in visual classification by learning from sparsely labelled data and abundant unlabelled data. Chapter 4: Open-Set Semi-Supervised Learning (Figure 1 (II)) further explores the potential of learning from abundant noisy unlabelled data, While existing SSL methods artificially assume that small labelled data and large unlabelled data are drawn from the same class distribution, we consider a more realistic and uncurated open-set semi-supervised learning paradigm. Considering visual data is always growing in many visual recognition tasks, it is therefore implausible to pre-define a fixed label space for the unlabelled data in advance. To investigate this new chal4 Limited-Label Regime Same Label Space Labelled Data Pool Unlabelled Data Pool (I) Close-Set Semi-Supervised Learning Propagate Label Chapter 3 (II) Open-Set Semi-Supervised Learning Labelled Data Pool Unlabelled Partial Shared Data Pool Label Space Selectively Propagate Label (III) Open-Set Cross-Domain Learning Labelled Data Pool Unlabelled Data Pool Disjoint Label Space & Domains Transfer Label [Chen et al. ICCV19] Unknown Label Space Unlabelled Data Pool Discover Label [Chen et al. BMVC18] (IV) Unsupervised Learning Chapter 4 Chapter 6 Chapter 5 [Chen et al. ECCV18] [Chen et al. AAAI20] Figure 1: An overview of the main studies in this thesis, which covers four different deep learning paradigms in the limited-label regime, including (I) close-set semi-supervised learning (Chapter 3), (II) open-set semi-supervised learning (Chapter 4), (III) open-set cross-domain learning (Chapter 5), and (IV) unsupervised learning (Chapter 6). Each chapter studies a specific deep learning paradigm that requires to propagate, selectively propagate, transfer, or discover label information for model optimisation, so as to minimise the manual efforts for label annotations. While the former two paradigms focus on semi-supervised learning for visual classification, i.e. recognising different visual categories; the latter two paradigms focus on semi-supervised and unsupervised learning for visual search, i.e. discriminating different instances such as persons. lenging learning paradigm, we established the first systematic work to tackle the open-set semisupervised learning problem in visual classification by a novel approach: uncertainty-aware selfdistillation [Chen et al., 2020b], which selectively propagates the soft label assignments on the unlabelled visual data for model optimisation. Built upon an accumulative ensembling strategy, our approach can jointly capture the model uncertainty to discard out-of-distribution samples, and propagate less overconfident label assignments on the unlabelled data to avoid catastrophic error propagation. As one of the pioneers to explore this learning paradigm, this work opens up new avenues for research in more realistic semi-supervised learning scenarios. Chapter 5: Open-Set Cross-Domain Learning (Figure 1 (III)) is a challenging semi-supervised learning paradigm of great practical value. When training a visual recognition model in an operating visual environment (i.e. source domain, such as the laboratory, simulation, or known scene), and then deploying it to unknown real-world scenes (i.e. target domain), it is likely that the model would fail to generalise well in the unseen visual target domain, especially when the target domain data comes from a disjoint label space with heterogeneous domain drift. Unlike prior works in domain adaptation that mostly consider a shared label space across two domains, we studied the more demanding open-set domain adaptation problem, where both label spaces and domains are disjoint across the labelled and unlabelled datasets. To learn from these heterogeneous datasets, we designed a novel domain context rendering scheme for open-set cross-domain learning in visual search [Chen et al., 2019a] – particularly for person re-identification, i.e. a realistic testbed to evaluate the representational power of fine-grained discrimination among very similar instances. Our key idea is to transfer the source identity labels into diverse target domain 5 contexts. Our approach enables the generation of an abundant amount of synthetic training data that selectively blend label information from source domain and context information from target domain. By training upon such synthetic data, our model can learn a more identity-discriminative and context-invariant representation for effective visual search in the target domain. This work sets a new state-of-the-art in cross-domain person re-identification and provides a novel and generic solution for open-set domain adaptation. Chapter 6: Unsupervised Learning (Figure 1 (IV)) considers the learning scenario with none labelled data. In this work, we explore unsupervised learning in visual search, particularly for person re-identification, a realistic testbed to study unsupervised learning, where person identity labels are generally very difficult to acquire over a wide surveillance space [Chen et al., 2018a]. In contrast to existing methods in person re-identification that requires exhaustive manual efforts for labelling cross-view pairwise data, we aims to learn visual representations without using any manual labels. Our generic rationale is to formulate auxiliary supervision signals that learn to uncover the underlying data distribution, consequently grouping the visual data in a meaningful and structural way. To learn from the unlabelled data in a fully unsupervised manner, we proposed a novel deep association learning scheme to uncover the underlying data-to-data association. Specifically, two unsupervised constraints – temporal consistency and cycle consistency – are formulated upon neighbourhood consistency to progressively associate visual features within and across video sequences of tracked persons. This work sets the new state-of-the-art in videobased unsupervised person re-identification and advances the automatic exploitation of video data in real-world surveillance. In summary, the goal of all these studies is to build efficient and scalable visual learning models in the limited-label regime, which empower to learn more powerful and reliable representations from complex unlabelled visual data and consequently learn more powerful visual representations to facilitate better visual recognition and visual search

    Multi-view Fuzzy Representation Learning with Rules based Model

    Full text link
    Unsupervised multi-view representation learning has been extensively studied for mining multi-view data. However, some critical challenges remain. On the one hand, the existing methods cannot explore multi-view data comprehensively since they usually learn a common representation between views, given that multi-view data contains both the common information between views and the specific information within each view. On the other hand, to mine the nonlinear relationship between data, kernel or neural network methods are commonly used for multi-view representation learning. However, these methods are lacking in interpretability. To this end, this paper proposes a new multi-view fuzzy representation learning method based on the interpretable Takagi-Sugeno-Kang (TSK) fuzzy system (MVRL_FS). The method realizes multi-view representation learning from two aspects. First, multi-view data are transformed into a high-dimensional fuzzy feature space, while the common information between views and specific information of each view are explored simultaneously. Second, a new regularization method based on L_(2,1)-norm regression is proposed to mine the consistency information between views, while the geometric structure of the data is preserved through the Laplacian graph. Finally, extensive experiments on many benchmark multi-view datasets are conducted to validate the superiority of the proposed method.Comment: This work has been accepted by IEEE Transactions on Knowledge and Data Engineerin
    • …
    corecore