2,872 research outputs found

    Deep Learning vs Spectral Clustering into an active clustering with pairwise constraints propagation

    Get PDF
    International audienceIn our data driven world, categorization is of major importance to help end-users and decision makers understanding information structures. Supervised learning techniques rely on annotated samples that are often difficult to obtain and training often overfits. On the other hand, unsupervised clustering techniques study the structure of the data without disposing of any training data. Given the difficulty of the task, supervised learning often outperforms unsupervised learning. A compromise is to use a partial knowledge, selected in a smart way, in order to boost performance while minimizing learning costs, what is called semi-supervised learning. In such use case, Spectral Clustering proved to be an efficient method. Also, Deep Learning outperformed several state of the art classification approaches and it is interesting to test it in our context. In this paper, we firstly introduce the concept of Deep Learning into an active semi-supervised clustering process and compare it with Spectral Clustering. Secondly, we introduce constraint propagation and demonstrate how it maximizes partitioning quality while reducing annotation costs. Experimental validation is conducted on two different real datasets. Results show the potential of the clustering methods

    Multimodal Clustering with Role Induced Constraints for Speaker Diarization

    Full text link
    Speaker clustering is an essential step in conventional speaker diarization systems and is typically addressed as an audio-only speech processing task. The language used by the participants in a conversation, however, carries additional information that can help improve the clustering performance. This is especially true in conversational interactions, such as business meetings, interviews, and lectures, where specific roles assumed by interlocutors (manager, client, teacher, etc.) are often associated with distinguishable linguistic patterns. In this paper we propose to employ a supervised text-based model to extract speaker roles and then use this information to guide an audio-based spectral clustering step by imposing must-link and cannot-link constraints between segments. The proposed method is applied on two different domains, namely on medical interactions and on podcast episodes, and is shown to yield improved results when compared to the audio-only approach.Comment: Submitted at Interspeech 202

    Optimization for Image Segmentation

    Get PDF
    Image segmentation, i.e., assigning each pixel a discrete label, is an essential task in computer vision with lots of applications. Major techniques for segmentation include for example Markov Random Field (MRF), Kernel Clustering (KC), and nowadays popular Convolutional Neural Networks (CNN). In this work, we focus on optimization for image segmentation. Techniques like MRF, KC, and CNN optimize MRF energies, KC criteria, or CNN losses respectively, and their corresponding optimization is very different. We are interested in the synergy and the complementary benefits of MRF, KC, and CNN for interactive segmentation and semantic segmentation. Our first contribution is pseudo-bound optimization for binary MRF energies that are high-order or non-submodular. Secondly, we propose Kernel Cut, a novel formulation for segmentation, which combines MRF regularization with Kernel Clustering. We show why to combine KC with MRF and how to optimize the joint objective. In the third part, we discuss how deep CNN segmentation can benefit from non-deep (i.e., shallow) methods like MRF and KC. In particular, we propose regularized losses for weakly-supervised CNN segmentation, in which we can integrate MRF energy or KC criteria as part of the losses. Minimization of regularized losses is a principled approach to semi-supervised learning, in general. Our regularized loss method is very simple and allows different kinds of regularization losses for CNN segmentation. We also study the optimization of regularized losses beyond gradient descent. Our regularized losses approach achieves state-of-the-art accuracy in semantic segmentation with near full supervision quality

    Coupled Deep Learning for Heterogeneous Face Recognition

    Full text link
    Heterogeneous face matching is a challenge issue in face recognition due to large domain difference as well as insufficient pairwise images in different modalities during training. This paper proposes a coupled deep learning (CDL) approach for the heterogeneous face matching. CDL seeks a shared feature space in which the heterogeneous face matching problem can be approximately treated as a homogeneous face matching problem. The objective function of CDL mainly includes two parts. The first part contains a trace norm and a block-diagonal prior as relevance constraints, which not only make unpaired images from multiple modalities be clustered and correlated, but also regularize the parameters to alleviate overfitting. An approximate variational formulation is introduced to deal with the difficulties of optimizing low-rank constraint directly. The second part contains a cross modal ranking among triplet domain specific images to maximize the margin for different identities and increase data for a small amount of training samples. Besides, an alternating minimization method is employed to iteratively update the parameters of CDL. Experimental results show that CDL achieves better performance on the challenging CASIA NIR-VIS 2.0 face recognition database, the IIIT-D Sketch database, the CUHK Face Sketch (CUFS), and the CUHK Face Sketch FERET (CUFSF), which significantly outperforms state-of-the-art heterogeneous face recognition methods.Comment: AAAI 201
    corecore