83 research outputs found
CoopHash: Cooperative Learning of Multipurpose Descriptor and Contrastive Pair Generator via Variational MCMC Teaching for Supervised Image Hashing
Leveraging supervised information can lead to superior retrieval performance
in the image hashing domain but the performance degrades significantly without
enough labeled data. One effective solution to boost the performance is to
employ generative models, such as Generative Adversarial Networks (GANs), to
generate synthetic data in an image hashing model. However, GAN-based methods
are difficult to train and suffer from mode collapse issue, which prevents the
hashing approaches from jointly training the generative models and the hash
functions. This limitation results in sub-optimal retrieval performance. To
overcome this limitation, we propose a novel framework, the generative
cooperative hashing network (CoopHash), which is based on the energy-based
cooperative learning. CoopHash jointly learns a powerful generative
representation of the data and a robust hash function. CoopHash has two
components: a top-down contrastive pair generator that synthesizes contrastive
images and a bottom-up multipurpose descriptor that simultaneously represents
the images from multiple perspectives, including probability density, hash
code, latent code, and category. The two components are jointly learned via a
novel likelihood-based cooperative learning scheme. We conduct experiments on
several real-world datasets and show that the proposed method outperforms the
competing hashing supervised methods, achieving up to 10% relative improvement
over the current state-of-the-art supervised hashing methods, and exhibits a
significantly better performance in out-of-distribution retrieval
Deep Clustering: A Comprehensive Survey
Cluster analysis plays an indispensable role in machine learning and data
mining. Learning a good data representation is crucial for clustering
algorithms. Recently, deep clustering, which can learn clustering-friendly
representations using deep neural networks, has been broadly applied in a wide
range of clustering tasks. Existing surveys for deep clustering mainly focus on
the single-view fields and the network architectures, ignoring the complex
application scenarios of clustering. To address this issue, in this paper we
provide a comprehensive survey for deep clustering in views of data sources.
With different data sources and initial conditions, we systematically
distinguish the clustering methods in terms of methodology, prior knowledge,
and architecture. Concretely, deep clustering methods are introduced according
to four categories, i.e., traditional single-view deep clustering,
semi-supervised deep clustering, deep multi-view clustering, and deep transfer
clustering. Finally, we discuss the open challenges and potential future
opportunities in different fields of deep clustering
Representation Learning with Adversarial Latent Autoencoders
A large number of deep learning methods applied to computer vision problems require encoder-decoder maps. These methods include, but are not limited to, self-representation learning, generalization, few-shot learning, and novelty detection. Encoder-decoder maps are also useful for photo manipulation, photo editing, superresolution, etc. Encoder-decoder maps are typically learned using autoencoder networks.Traditionally, autoencoder reciprocity is achieved in the image-space using pixel-wisesimilarity loss, which has a widely known flaw of producing non-realistic reconstructions. This flaw is typical for the Variational Autoencoder (VAE) family and is not only limited to pixel-wise similarity losses, but is common to all methods relying upon the explicit maximum likelihood training paradigm, as opposed to an implicit one. Likelihood maximization, coupled with poor decoder distribution leads to poor or blurry reconstructions at best. Generative Adversarial Networks (GANs) on the other hand, perform an implicit maximization of the likelihood by solving a minimax game, thus bypassing the issues derived from the explicit maximization. This provides GAN architectures with remarkable generative power, enabling the generation of high-resolution images of humans, which are indistinguishable from real photos to the naked eye. However, GAN architectures lack inference capabilities, which makes them unsuitable for training encoder-decoder maps, effectively limiting their application space.We introduce an autoencoder architecture that (a) is free from the consequences ofmaximizing the likelihood directly, (b) produces reconstructions competitive in quality with state-of-the-art GAN architectures, and (c) allows learning disentangled representations, which makes it useful in a variety of problems. We show that the proposed architecture and training paradigm significantly improves the state-of-the-art in novelty and anomaly detection methods, it enables novel kinds of image manipulations, and has significant potential for other applications
Deep Learning for Free-Hand Sketch: A Survey
Free-hand sketches are highly illustrative, and have been widely used by
humans to depict objects or stories from ancient times to the present. The
recent prevalence of touchscreen devices has made sketch creation a much easier
task than ever and consequently made sketch-oriented applications increasingly
popular. The progress of deep learning has immensely benefited free-hand sketch
research and applications. This paper presents a comprehensive survey of the
deep learning techniques oriented at free-hand sketch data, and the
applications that they enable. The main contents of this survey include: (i) A
discussion of the intrinsic traits and unique challenges of free-hand sketch,
to highlight the essential differences between sketch data and other data
modalities, e.g., natural photos. (ii) A review of the developments of
free-hand sketch research in the deep learning era, by surveying existing
datasets, research topics, and the state-of-the-art methods through a detailed
taxonomy and experimental evaluation. (iii) Promotion of future work via a
discussion of bottlenecks, open problems, and potential research directions for
the community.Comment: This paper is accepted by IEEE TPAM
Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services
Artificial Intelligence-Generated Content (AIGC) is an automated method for
generating, manipulating, and modifying valuable and diverse data using AI
algorithms creatively. This survey paper focuses on the deployment of AIGC
applications, e.g., ChatGPT and Dall-E, at mobile edge networks, namely mobile
AIGC networks, that provide personalized and customized AIGC services in real
time while maintaining user privacy. We begin by introducing the background and
fundamentals of generative models and the lifecycle of AIGC services at mobile
AIGC networks, which includes data collection, training, finetuning, inference,
and product management. We then discuss the collaborative cloud-edge-mobile
infrastructure and technologies required to support AIGC services and enable
users to access AIGC at mobile edge networks. Furthermore, we explore
AIGCdriven creative applications and use cases for mobile AIGC networks.
Additionally, we discuss the implementation, security, and privacy challenges
of deploying mobile AIGC networks. Finally, we highlight some future research
directions and open issues for the full realization of mobile AIGC networks
Deep Hashing for Image Similarity Search
Hashing for similarity search is one of the most widely used methods to solve the approximate nearest neighbor search problem. In this method, one first maps data items from a real valued high-dimensional space to a suitable low dimensional binary code space and then performs the approximate nearest neighbor search in this code space instead. This is beneficial because the search in the code space can be solved more efficiently in terms of runtime complexity and storage consumption. Obviously, for this method to succeed, it is necessary that similar data items be mapped to binary code words that have small Hamming distance. For real-world data such as images, one usually proceeds as follows. For each data item, a pre-processing algorithm removes noise and insignificant information and extracts important discriminating information to generate a feature vector that captures the important semantic content. Next, a vector hash function maps this real valued feature vector to a binary code word. It is also possible to use the raw feature vectors afterwards to further process the search result candidates produced by binary hash codes. In this dissertation we focus on the following. First, developing a learning based counterpart for the MinHash hashing algorithm. Second, presenting a new unsupervised hashing method UmapHash to map the neighborhood relations of data items from the feature vector space to the binary hash code space. Finally, an application of the aforementioned hashing methods for rapid face image recognition
- …