684 research outputs found
Cluster-wise Unsupervised Hashing for Cross-Modal Similarity Search
Large-scale cross-modal hashing similarity retrieval has attracted more and
more attention in modern search applications such as search engines and
autopilot, showing great superiority in computation and storage. However,
current unsupervised cross-modal hashing methods still have some limitations:
(1)many methods relax the discrete constraints to solve the optimization
objective which may significantly degrade the retrieval performance;(2)most
existing hashing model project heterogenous data into a common latent space,
which may always lose sight of diversity in heterogenous data;(3)transforming
real-valued data point to binary codes always results in abundant loss of
information, producing the suboptimal continuous latent space. To overcome
above problems, in this paper, a novel Cluster-wise Unsupervised Hashing (CUH)
method is proposed. Specifically, CUH jointly performs the multi-view
clustering that projects the original data points from different modalities
into its own low-dimensional latent semantic space and finds the cluster
centroid points and the common clustering indicators in its own low-dimensional
space, and learns the compact hash codes and the corresponding linear hash
functions. An discrete optimization framework is developed to learn the unified
binary codes across modalities under the guidance cluster-wise code-prototypes.
The reasonableness and effectiveness of CUH is well demonstrated by
comprehensive experiments on diverse benchmark datasets.Comment: 13 pages, 26 figure
Jointly Deep Multi-View Learning for Clustering Analysis
In this paper, we propose a novel Joint framework for Deep Multi-view
Clustering (DMJC), where multiple deep embedded features, multi-view fusion
mechanism and clustering assignments can be learned simultaneously. Our key
idea is that the joint learning strategy can sufficiently exploit
clustering-friendly multi-view features and useful multi-view complementary
information to improve the clustering performance. How to realize the
multi-view fusion in such a joint framework is the primary challenge. To do so,
we design two ingenious variants of deep multi-view joint clustering models
under the proposed framework, where multi-view fusion is implemented by two
different schemes. The first model, called DMJC-S, performs multi-view fusion
in an implicit way via a novel multi-view soft assignment distribution. The
second model, termed DMJC-T, defines a novel multi-view auxiliary target
distribution to conduct the multi-view fusion explicitly. Both DMJC-S and
DMJC-T are optimized under a KL divergence like clustering objective.
Experiments on six challenging image datasets demonstrate the superiority of
both DMJC-S and DMJC-T over single/multi-view baselines and the
state-of-the-art multiview clustering methods, which proves the effectiveness
of the proposed DMJC framework. To our best knowledge, this is the first work
to model the multi-view clustering in a deep joint framework, which will
provide a meaningful thinking in unsupervised multi-view learning.Comment: 10 pages, 4 figure
Robust Localized Multi-view Subspace Clustering
In multi-view clustering, different views may have different confidence
levels when learning a consensus representation. Existing methods usually
address this by assigning distinctive weights to different views. However, due
to noisy nature of real-world applications, the confidence levels of samples in
the same view may also vary. Thus considering a unified weight for a view may
lead to suboptimal solutions. In this paper, we propose a novel localized
multi-view subspace clustering model that considers the confidence levels of
both views and samples. By assigning weight to each sample under each view
properly, we can obtain a robust consensus representation via fusing the
noiseless structures among views and samples. We further develop a regularizer
on weight parameters based on the convex conjugacy theory, and samples weights
are determined in an adaptive manner. An efficient iterative algorithm is
developed with a convergence guarantee. Experimental results on four benchmarks
demonstrate the correctness and effectiveness of the proposed model.Comment: 7 page
Tracking Persons-of-Interest via Unsupervised Representation Adaptation
Multi-face tracking in unconstrained videos is a challenging problem as faces
of one person often appear drastically different in multiple shots due to
significant variations in scale, pose, expression, illumination, and make-up.
Existing multi-target tracking methods often use low-level features which are
not sufficiently discriminative for identifying faces with such large
appearance variations. In this paper, we tackle this problem by learning
discriminative, video-specific face representations using convolutional neural
networks (CNNs). Unlike existing CNN-based approaches which are only trained on
large-scale face image datasets offline, we use the contextual constraints to
generate a large number of training samples for a given video, and further
adapt the pre-trained face CNN to specific videos using discovered training
samples. Using these training samples, we optimize the embedding space so that
the Euclidean distances correspond to a measure of semantic face similarity via
minimizing a triplet loss function. With the learned discriminative features,
we apply the hierarchical clustering algorithm to link tracklets across
multiple shots to generate trajectories. We extensively evaluate the proposed
algorithm on two sets of TV sitcoms and YouTube music videos, analyze the
contribution of each component, and demonstrate significant performance
improvement over existing techniques.Comment: Project page: http://vllab1.ucmerced.edu/~szhang/FaceTracking
Object Detection with Pixel Intensity Comparisons Organized in Decision Trees
We describe a method for visual object detection based on an ensemble of
optimized decision trees organized in a cascade of rejectors. The trees use
pixel intensity comparisons in their internal nodes and this makes them able to
process image regions very fast. Experimental analysis is provided through a
face detection problem. The obtained results are encouraging and demonstrate
that the method has practical value. Additionally, we analyse its sensitivity
to noise and show how to perform fast rotation invariant object detection.
Complete source code is provided at https://github.com/nenadmarkus/pico
Unsupervised robotic sorting: Towards autonomous decision making robots
Autonomous sorting is a crucial task in industrial robotics which can be very
challenging depending on the expected amount of automation. Usually, to decide
where to sort an object, the system needs to solve either an instance retrieval
(known object) or a supervised classification (predefined set of classes)
problem. In this paper, we introduce a new decision making module, where the
robotic system chooses how to sort the objects in an unsupervised way. We call
this problem Unsupervised Robotic Sorting (URS) and propose an implementation
on an industrial robotic system, using deep CNN feature extraction and standard
clustering algorithms. We carry out extensive experiments on various standard
datasets to demonstrate the efficiency of the proposed image clustering
pipeline. To evaluate the robustness of our URS implementation, we also
introduce a complex real world dataset containing images of objects under
various background and lighting conditions. This dataset is used to fine tune
the design choices (CNN and clustering algorithm) for URS. Finally, we propose
a method combining our pipeline with ensemble clustering to use multiple images
of each object. This redundancy of information about the objects is shown to
increase the clustering results.Comment: Paper published in International Journal of Artificial Intelligence
and Applications (IJAIA), March 2018, Volume 9, Number 2 17 pages, 5 figures,
7 tables. arXiv admin note: text overlap with arXiv:1707.0170
Detecting Text in the Wild with Deep Character Embedding Network
Most text detection methods hypothesize texts are horizontal or
multi-oriented and thus define quadrangles as the basic detection unit.
However, text in the wild is usually perspectively distorted or curved, which
can not be easily tackled by existing approaches. In this paper, we propose a
deep character embedding network (CENet) which simultaneously predicts the
bounding boxes of characters and their embedding vectors, thus making text
detection a simple clustering task in the character embedding space. The
proposed method does not require strong assumptions of forming a straight line
on general text detection, which provides flexibility on arbitrarily curved or
perspectively distorted text. For character detection task, a dense prediction
subnetwork is designed to obtain the confidence score and bounding boxes of
characters. For character embedding task, a subnet is trained with contrastive
loss to project detected characters into embedding space. The two tasks share a
backbone CNN from which the multi-scale feature maps are extracted. The final
text regions can be easily achieved by a thresholding process on character
confidence and embedding distance of character pairs. We evaluated our method
on ICDAR13, ICDAR15, MSRA-TD500, and Total-Text. The proposed method achieves
state-of-the-art or comparable performance on all these datasets, and shows
substantial improvement in the irregular-text datasets, i.e. Total-Text.Comment: Asian Conference on Computer Vision 201
Kernelized Multiview Subspace Analysis by Self-weighted Learning
With the popularity of multimedia technology, information is always
represented or transmitted from multiple views. Most of the existing algorithms
are graph-based ones to learn the complex structures within multiview data but
overlooked the information within data representations. Furthermore, many
existing works treat multiple views discriminatively by introducing some
hyperparameters, which is undesirable in practice. To this end, abundant
multiview based methods have been proposed for dimension reduction. However,
there are still no research to leverage the existing work into a unified
framework. To address this issue, in this paper, we propose a general framework
for multiview data dimension reduction, named Kernelized Multiview Subspace
Analysis (KMSA). It directly handles the multi-view feature representation in
the kernel space, which provides a feasible channel for direct manipulations on
multiview data with different dimensions. Meanwhile, compared with those
graph-based methods, KMSA can fully exploit information from multiview data
with nothing to lose. Furthermore, since different views have different
influences on KMSA, we propose a self-weighted strategy to treat different
views discriminatively according to their contributions. A co-regularized term
is proposed to promote the mutual learning from multi-views. KMSA combines
self-weighted learning with the co-regularized term to learn appropriate
weights for all views. We also discuss the influence of the parameters in KMSA
regarding the weights of multi-views. We evaluate our proposed framework on 6
multiview datasets for classification and image retrieval. The experimental
results validate the advantages of our proposed method.Comment: Accepted by IEEE Transactions on Multimedia with Minor Revision
Multi-graph Fusion for Multi-view Spectral Clustering
A panoply of multi-view clustering algorithms has been developed to deal with
prevalent multi-view data. Among them, spectral clustering-based methods have
drawn much attention and demonstrated promising results recently. Despite
progress, there are still two fundamental questions that stay unanswered to
date. First, how to fuse different views into one graph. More often than not,
the similarities between samples may be manifested differently by different
views. Many existing algorithms either simply take the average of multiple
views or just learn a common graph. These simple approaches fail to consider
the flexible local manifold structures of all views. Hence, the rich
heterogeneous information is not fully exploited. Second, how to learn the
explicit cluster structure. Most existing methods don't pay attention to the
quality of the graphs and perform graph learning and spectral clustering
separately. Those unreliable graphs might lead to suboptimal clustering
results. To fill these gaps, in this paper, we propose a novel multi-view
spectral clustering model which performs graph fusion and spectral clustering
simultaneously. The fusion graph approximates the original graph of each
individual view but maintains an explicit cluster structure. Experiments on
four widely used data sets confirm the superiority of the proposed method.Comment: submitted to Knowledge-based System
Multi-Similarity Loss with General Pair Weighting for Deep Metric Learning
A family of loss functions built on pair-based computation have been proposed
in the literature which provide a myriad of solutions for deep metric learning.
In this paper, we provide a general weighting framework for understanding
recent pair-based loss functions. Our contributions are three-fold: (1) we
establish a General Pair Weighting (GPW) framework, which casts the sampling
problem of deep metric learning into a unified view of pair weighting through
gradient analysis, providing a powerful tool for understanding recent
pair-based loss functions; (2) we show that with GPW, various existing
pair-based methods can be compared and discussed comprehensively, with clear
differences and key limitations identified; (3) we propose a new loss called
multi-similarity loss (MS loss) under the GPW, which is implemented in two
iterative steps (i.e., mining and weighting). This allows it to fully consider
three similarities for pair weighting, providing a more principled approach for
collecting and weighting informative pairs. Finally, the proposed MS loss
obtains new state-of-the-art performance on four image retrieval benchmarks,
where it outperforms the most recent approaches, such as
ABE\cite{Kim_2018_ECCV} and HTL by a large margin: 60.6% to 65.7% on CUB200,
and 80.9% to 88.0% on In-Shop Clothes Retrieval dataset at Recall@1. Code is
available at https://github.com/MalongTech/research-ms-loss.Comment: Accepted CVPR 2019, rewrite main method to be more clea
- …