83 research outputs found

    CoopHash: Cooperative Learning of Multipurpose Descriptor and Contrastive Pair Generator via Variational MCMC Teaching for Supervised Image Hashing

    Full text link
    Leveraging supervised information can lead to superior retrieval performance in the image hashing domain but the performance degrades significantly without enough labeled data. One effective solution to boost the performance is to employ generative models, such as Generative Adversarial Networks (GANs), to generate synthetic data in an image hashing model. However, GAN-based methods are difficult to train and suffer from mode collapse issue, which prevents the hashing approaches from jointly training the generative models and the hash functions. This limitation results in sub-optimal retrieval performance. To overcome this limitation, we propose a novel framework, the generative cooperative hashing network (CoopHash), which is based on the energy-based cooperative learning. CoopHash jointly learns a powerful generative representation of the data and a robust hash function. CoopHash has two components: a top-down contrastive pair generator that synthesizes contrastive images and a bottom-up multipurpose descriptor that simultaneously represents the images from multiple perspectives, including probability density, hash code, latent code, and category. The two components are jointly learned via a novel likelihood-based cooperative learning scheme. We conduct experiments on several real-world datasets and show that the proposed method outperforms the competing hashing supervised methods, achieving up to 10% relative improvement over the current state-of-the-art supervised hashing methods, and exhibits a significantly better performance in out-of-distribution retrieval

    Deep Clustering: A Comprehensive Survey

    Full text link
    Cluster analysis plays an indispensable role in machine learning and data mining. Learning a good data representation is crucial for clustering algorithms. Recently, deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering. To address this issue, in this paper we provide a comprehensive survey for deep clustering in views of data sources. With different data sources and initial conditions, we systematically distinguish the clustering methods in terms of methodology, prior knowledge, and architecture. Concretely, deep clustering methods are introduced according to four categories, i.e., traditional single-view deep clustering, semi-supervised deep clustering, deep multi-view clustering, and deep transfer clustering. Finally, we discuss the open challenges and potential future opportunities in different fields of deep clustering

    Representation Learning with Adversarial Latent Autoencoders

    Get PDF
    A large number of deep learning methods applied to computer vision problems require encoder-decoder maps. These methods include, but are not limited to, self-representation learning, generalization, few-shot learning, and novelty detection. Encoder-decoder maps are also useful for photo manipulation, photo editing, superresolution, etc. Encoder-decoder maps are typically learned using autoencoder networks.Traditionally, autoencoder reciprocity is achieved in the image-space using pixel-wisesimilarity loss, which has a widely known flaw of producing non-realistic reconstructions. This flaw is typical for the Variational Autoencoder (VAE) family and is not only limited to pixel-wise similarity losses, but is common to all methods relying upon the explicit maximum likelihood training paradigm, as opposed to an implicit one. Likelihood maximization, coupled with poor decoder distribution leads to poor or blurry reconstructions at best. Generative Adversarial Networks (GANs) on the other hand, perform an implicit maximization of the likelihood by solving a minimax game, thus bypassing the issues derived from the explicit maximization. This provides GAN architectures with remarkable generative power, enabling the generation of high-resolution images of humans, which are indistinguishable from real photos to the naked eye. However, GAN architectures lack inference capabilities, which makes them unsuitable for training encoder-decoder maps, effectively limiting their application space.We introduce an autoencoder architecture that (a) is free from the consequences ofmaximizing the likelihood directly, (b) produces reconstructions competitive in quality with state-of-the-art GAN architectures, and (c) allows learning disentangled representations, which makes it useful in a variety of problems. We show that the proposed architecture and training paradigm significantly improves the state-of-the-art in novelty and anomaly detection methods, it enables novel kinds of image manipulations, and has significant potential for other applications

    Deep Learning for Free-Hand Sketch: A Survey

    Get PDF
    Free-hand sketches are highly illustrative, and have been widely used by humans to depict objects or stories from ancient times to the present. The recent prevalence of touchscreen devices has made sketch creation a much easier task than ever and consequently made sketch-oriented applications increasingly popular. The progress of deep learning has immensely benefited free-hand sketch research and applications. This paper presents a comprehensive survey of the deep learning techniques oriented at free-hand sketch data, and the applications that they enable. The main contents of this survey include: (i) A discussion of the intrinsic traits and unique challenges of free-hand sketch, to highlight the essential differences between sketch data and other data modalities, e.g., natural photos. (ii) A review of the developments of free-hand sketch research in the deep learning era, by surveying existing datasets, research topics, and the state-of-the-art methods through a detailed taxonomy and experimental evaluation. (iii) Promotion of future work via a discussion of bottlenecks, open problems, and potential research directions for the community.Comment: This paper is accepted by IEEE TPAM

    Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

    Full text link
    Artificial Intelligence-Generated Content (AIGC) is an automated method for generating, manipulating, and modifying valuable and diverse data using AI algorithms creatively. This survey paper focuses on the deployment of AIGC applications, e.g., ChatGPT and Dall-E, at mobile edge networks, namely mobile AIGC networks, that provide personalized and customized AIGC services in real time while maintaining user privacy. We begin by introducing the background and fundamentals of generative models and the lifecycle of AIGC services at mobile AIGC networks, which includes data collection, training, finetuning, inference, and product management. We then discuss the collaborative cloud-edge-mobile infrastructure and technologies required to support AIGC services and enable users to access AIGC at mobile edge networks. Furthermore, we explore AIGCdriven creative applications and use cases for mobile AIGC networks. Additionally, we discuss the implementation, security, and privacy challenges of deploying mobile AIGC networks. Finally, we highlight some future research directions and open issues for the full realization of mobile AIGC networks

    Deep Hashing for Image Similarity Search

    Get PDF
    Hashing for similarity search is one of the most widely used methods to solve the approximate nearest neighbor search problem. In this method, one first maps data items from a real valued high-dimensional space to a suitable low dimensional binary code space and then performs the approximate nearest neighbor search in this code space instead. This is beneficial because the search in the code space can be solved more efficiently in terms of runtime complexity and storage consumption. Obviously, for this method to succeed, it is necessary that similar data items be mapped to binary code words that have small Hamming distance. For real-world data such as images, one usually proceeds as follows. For each data item, a pre-processing algorithm removes noise and insignificant information and extracts important discriminating information to generate a feature vector that captures the important semantic content. Next, a vector hash function maps this real valued feature vector to a binary code word. It is also possible to use the raw feature vectors afterwards to further process the search result candidates produced by binary hash codes. In this dissertation we focus on the following. First, developing a learning based counterpart for the MinHash hashing algorithm. Second, presenting a new unsupervised hashing method UmapHash to map the neighborhood relations of data items from the feature vector space to the binary hash code space. Finally, an application of the aforementioned hashing methods for rapid face image recognition
    • …
    corecore