100,162 research outputs found
Unsupervised learning of generative topic saliency for person re-identification
(c) 2014. The copyright of this document resides with its authors.
It may be distributed unchanged freely in print or electronic forms.© 2014. The copyright of this document resides with its authors. Existing approaches to person re-identification (re-id) are dominated by supervised learning based methods which focus on learning optimal similarity distance metrics. However, supervised learning based models require a large number of manually labelled pairs of person images across every pair of camera views. This thus limits their ability to scale to large camera networks. To overcome this problem, this paper proposes a novel unsupervised re-id modelling approach by exploring generative probabilistic topic modelling. Given abundant unlabelled data, our topic model learns to simultaneously both (1) discover localised person foreground appearance saliency (salient image patches) that are more informative for re-id matching, and (2) remove busy background clutters surrounding a person. Extensive experiments are carried out to demonstrate that the proposed model outperforms existing unsupervised learning re-id methods with significantly simplified model complexity. In the meantime, it still retains comparable re-id accuracy when compared to the state-of-the-art supervised re-id methods but without any need for pair-wise labelled training data
Variational Deep Semantic Hashing for Text Documents
As the amount of textual data has been rapidly increasing over the past
decade, efficient similarity search methods have become a crucial component of
large-scale information retrieval systems. A popular strategy is to represent
original data samples by compact binary codes through hashing. A spectrum of
machine learning methods have been utilized, but they often lack expressiveness
and flexibility in modeling to learn effective representations. The recent
advances of deep learning in a wide range of applications has demonstrated its
capability to learn robust and powerful feature representations for complex
data. Especially, deep generative models naturally combine the expressiveness
of probabilistic generative models with the high capacity of deep neural
networks, which is very suitable for text modeling. However, little work has
leveraged the recent progress in deep learning for text hashing.
In this paper, we propose a series of novel deep document generative models
for text hashing. The first proposed model is unsupervised while the second one
is supervised by utilizing document labels/tags for hashing. The third model
further considers document-specific factors that affect the generation of
words. The probabilistic generative formulation of the proposed models provides
a principled framework for model extension, uncertainty estimation, simulation,
and interpretability. Based on variational inference and reparameterization,
the proposed models can be interpreted as encoder-decoder deep neural networks
and thus they are capable of learning complex nonlinear distributed
representations of the original documents. We conduct a comprehensive set of
experiments on four public testbeds. The experimental results have demonstrated
the effectiveness of the proposed supervised learning models for text hashing.Comment: 11 pages, 4 figure
Embedding-based Method for the Supervised Link Prediction in Social Networks
In recent years, social network analysis has received a lot of interest. Link prediction is an important area of research in this field that uses information from current networks to predict the likely links that will emerge in the future. It has attracted considerable attention from interdisciplinary research communities due to its ubiquitous applications in biological networks, computer science, transportation networks, bioinformatics, telecommunication networks, and so on. Currently, supervised machine learning is one of the critical techniques in the link prediction task. Several algorithms have been developed by many authors to predict the future link in the network, but there is still scope to improve the previous approaches. In the supervised link prediction process, feature selection is a crucial step. Most existing algorithms use one type of similarity-based feature to represent data, which is not well described due to the large scale and heterogeneity of social networks. One of the newest techniques for link prediction is embedding methods, which are used to preparing the feature vector for each the nonexisting links in the network. In this paper, we introduce a novel approach to supervised link prediction based on feature embedding methods in order to achieve better performance. Our contribution considers a set of embedding methods as the feature vector for training the machine learning classifiers. The main focus of this work is to investigate the relevance of different feature embedding methods to improve the performance of the supervised link prediction models. The experimental results on some real-world temporal networks revealed satisfactory results, which encourage us for further analysis. Moreover, the use of feature embedding methods will provide better performance in this regard
Simultaneous Feature Learning and Hash Coding with Deep Neural Networks
Similarity-preserving hashing is a widely-used method for nearest neighbour
search in large-scale image retrieval tasks. For most existing hashing methods,
an image is first encoded as a vector of hand-engineering visual features,
followed by another separate projection or quantization step that generates
binary codes. However, such visual feature vectors may not be optimally
compatible with the coding process, thus producing sub-optimal hashing codes.
In this paper, we propose a deep architecture for supervised hashing, in which
images are mapped into binary codes via carefully designed deep neural
networks. The pipeline of the proposed deep architecture consists of three
building blocks: 1) a sub-network with a stack of convolution layers to produce
the effective intermediate image features; 2) a divide-and-encode module to
divide the intermediate image features into multiple branches, each encoded
into one hash bit; and 3) a triplet ranking loss designed to characterize that
one image is more similar to the second image than to the third one. Extensive
evaluations on several benchmark image datasets show that the proposed
simultaneous feature learning and hash coding pipeline brings substantial
improvements over other state-of-the-art supervised or unsupervised hashing
methods.Comment: This paper has been accepted to IEEE International Conference on
Pattern Recognition and Computer Vision (CVPR), 201
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
While it is nearly effortless for humans to quickly assess the perceptual
similarity between two images, the underlying processes are thought to be quite
complex. Despite this, the most widely used perceptual metrics today, such as
PSNR and SSIM, are simple, shallow functions, and fail to account for many
nuances of human perception. Recently, the deep learning community has found
that features of the VGG network trained on ImageNet classification has been
remarkably useful as a training loss for image synthesis. But how perceptual
are these so-called "perceptual losses"? What elements are critical for their
success? To answer these questions, we introduce a new dataset of human
perceptual similarity judgments. We systematically evaluate deep features
across different architectures and tasks and compare them with classic metrics.
We find that deep features outperform all previous metrics by large margins on
our dataset. More surprisingly, this result is not restricted to
ImageNet-trained VGG features, but holds across different deep architectures
and levels of supervision (supervised, self-supervised, or even unsupervised).
Our results suggest that perceptual similarity is an emergent property shared
across deep visual representations.Comment: Accepted to CVPR 2018; Code and data available at
https://www.github.com/richzhang/PerceptualSimilarit
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching
Obtaining large pre-trained models that can be fine-tuned to new tasks with
limited annotated samples has remained an open challenge for medical imaging
data. While pre-trained deep networks on ImageNet and vision-language
foundation models trained on web-scale data are prevailing approaches, their
effectiveness on medical tasks is limited due to the significant domain shift
between natural and medical images. To bridge this gap, we introduce LVM-Med,
the first family of deep networks trained on large-scale medical datasets. We
have collected approximately 1.3 million medical images from 55 publicly
available datasets, covering a large number of organs and modalities such as
CT, MRI, X-ray, and Ultrasound. We benchmark several state-of-the-art
self-supervised algorithms on this dataset and propose a novel self-supervised
contrastive learning algorithm using a graph-matching formulation. The proposed
approach makes three contributions: (i) it integrates prior pair-wise image
similarity metrics based on local and global information; (ii) it captures the
structural constraints of feature embeddings through a loss function
constructed via a combinatorial graph-matching objective; and (iii) it can be
trained efficiently end-to-end using modern gradient-estimation techniques for
black-box solvers. We thoroughly evaluate the proposed LVM-Med on 15 downstream
medical tasks ranging from segmentation and classification to object detection,
and both for the in and out-of-distribution settings. LVM-Med empirically
outperforms a number of state-of-the-art supervised, self-supervised, and
foundation models. For challenging tasks such as Brain Tumor Classification or
Diabetic Retinopathy Grading, LVM-Med improves previous vision-language models
trained on 1 billion masks by 6-7% while using only a ResNet-50.Comment: Update Appendi
- …