10,485 research outputs found
Person Search in Videos with One Portrait Through Visual and Temporal Links
In real-world applications, e.g. law enforcement and video retrieval, one
often needs to search a certain person in long videos with just one portrait.
This is much more challenging than the conventional settings for person
re-identification, as the search may need to be carried out in the environments
different from where the portrait was taken. In this paper, we aim to tackle
this challenge and propose a novel framework, which takes into account the
identity invariance along a tracklet, thus allowing person identities to be
propagated via both the visual and the temporal links. We also develop a novel
scheme called Progressive Propagation via Competitive Consensus, which
significantly improves the reliability of the propagation process. To promote
the study of person search, we construct a large-scale benchmark, which
contains 127K manually annotated tracklets from 192 movies. Experiments show
that our approach remarkably outperforms mainstream person re-id methods,
raising the mAP from 42.16% to 62.27%.Comment: European Conference on Computer Vision (ECCV), 201
An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges
Multimedia retrieval plays an indispensable role in big data utilization.
Past efforts mainly focused on single-media retrieval. However, the
requirements of users are highly flexible, such as retrieving the relevant
audio clips with one query of image. So challenges stemming from the "media
gap", which means that representations of different media types are
inconsistent, have attracted increasing attention. Cross-media retrieval is
designed for the scenarios where the queries and retrieval results are of
different media types. As a relatively new research topic, its concepts,
methodologies and benchmarks are still not clear in the literatures. To address
these issues, we review more than 100 references, give an overview including
the concepts, methodologies, major challenges and open issues, as well as build
up the benchmarks including datasets and experimental results. Researchers can
directly adopt the benchmarks to promptly evaluate their proposed methods. This
will help them to focus on algorithm design, rather than the time-consuming
compared methods and results. It is noted that we have constructed a new
dataset XMedia, which is the first publicly available dataset with up to five
media types (text, image, video, audio and 3D model). We believe this overview
will attract more researchers to focus on cross-media retrieval and be helpful
to them.Comment: 14 pages, accepted by IEEE Transactions on Circuits and Systems for
Video Technolog
Triplet-Based Deep Hashing Network for Cross-Modal Retrieval
Given the benefits of its low storage requirements and high retrieval
efficiency, hashing has recently received increasing attention. In
particular,cross-modal hashing has been widely and successfully used in
multimedia similarity search applications. However, almost all existing methods
employing cross-modal hashing cannot obtain powerful hash codes due to their
ignoring the relative similarity between heterogeneous data that contains
richer semantic information, leading to unsatisfactory retrieval performance.
In this paper, we propose a triplet-based deep hashing (TDH) network for
cross-modal retrieval. First, we utilize the triplet labels, which describes
the relative relationships among three instances as supervision in order to
capture more general semantic correlations between cross-modal instances. We
then establish a loss function from the inter-modal view and the intra-modal
view to boost the discriminative abilities of the hash codes. Finally, graph
regularization is introduced into our proposed TDH method to preserve the
original semantic similarity between hash codes in Hamming space. Experimental
results show that our proposed method outperforms several state-of-the-art
approaches on two popular cross-modal datasets
Deep Exemplar-based Colorization
We propose the first deep learning approach for exemplar-based local
colorization. Given a reference color image, our convolutional neural network
directly maps a grayscale image to an output colorized image. Rather than using
hand-crafted rules as in traditional exemplar-based methods, our end-to-end
colorization network learns how to select, propagate, and predict colors from
the large-scale data. The approach performs robustly and generalizes well even
when using reference images that are unrelated to the input grayscale image.
More importantly, as opposed to other learning-based colorization methods, our
network allows the user to achieve customizable results by simply feeding
different references. In order to further reduce manual effort in selecting the
references, the system automatically recommends references with our proposed
image retrieval algorithm, which considers both semantic and luminance
information. The colorization can be performed fully automatically by simply
picking the top reference suggestion. Our approach is validated through a user
study and favorable quantitative comparisons to the-state-of-the-art methods.
Furthermore, our approach can be naturally extended to video colorization. Our
code and models will be freely available for public use.Comment: To Appear in Siggraph 201
Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging
Image retagging aims to improve tag quality of social images by refining
their original tags or assigning new high-quality tags. Recent approaches
simultaneously explore visual, user and tag information to improve the
performance of image retagging by constructing and exploring an image-tag-user
graph. However, such methods will become computationally infeasible with the
rapidly increasing number of images, tags and users. It has been proven that
Anchor Graph Regularization (AGR) can significantly accelerate large-scale
graph learning model by exploring only a small number of anchor points.
Inspired by this, we propose a novel Social anchor-Unit GrAph Regularized
Tensor Completion (SUGAR-TC) method to effectively refine the tags of social
images, which is insensitive to the scale of the applied data. First, we
construct an anchor-unit graph across multiple domains (e.g., image and user
domains) rather than traditional anchor graph in a single domain. Second, a
tensor completion based on SUGAR is implemented on the original image-tag-user
tensor to refine the tags of the anchor images. Third, we efficiently assign
tags to non-anchor images by leveraging the relationship between the non-anchor
images and the anchor units. Experimental results on a real-world social image
database well demonstrate the effectiveness of SUGAR-TC, outperforming several
related methods
Recent Advance in Content-based Image Retrieval: A Literature Survey
The explosive increase and ubiquitous accessibility of visual data on the Web
have led to the prosperity of research activity in image search or retrieval.
With the ignorance of visual content as a ranking clue, methods with text
search techniques for visual retrieval may suffer inconsistency between the
text words and visual content. Content-based image retrieval (CBIR), which
makes use of the representation of visual content to identify relevant images,
has attracted sustained attention in recent two decades. Such a problem is
challenging due to the intention gap and the semantic gap problems. Numerous
techniques have been developed for content-based image retrieval in the last
decade. The purpose of this paper is to categorize and evaluate those
algorithms proposed during the period of 2003 to 2016. We conclude with several
promising directions for future research.Comment: 22 page
RANet: Ranking Attention Network for Fast Video Object Segmentation
Despite online learning (OL) techniques have boosted the performance of
semi-supervised video object segmentation (VOS) methods, the huge time costs of
OL greatly restrict their practicality. Matching based and propagation based
methods run at a faster speed by avoiding OL techniques. However, they are
limited by sub-optimal accuracy, due to mismatching and drifting problems. In
this paper, we develop a real-time yet very accurate Ranking Attention Network
(RANet) for VOS. Specifically, to integrate the insights of matching based and
propagation based methods, we employ an encoder-decoder framework to learn
pixel-level similarity and segmentation in an end-to-end manner. To better
utilize the similarity maps, we propose a novel ranking attention module, which
automatically ranks and selects these maps for fine-grained VOS performance.
Experiments on DAVIS-16 and DAVIS-17 datasets show that our RANet achieves the
best speed-accuracy trade-off, e.g., with 33 milliseconds per frame and
J&F=85.5% on DAVIS-16. With OL, our RANet reaches J&F=87.1% on DAVIS-16,
exceeding state-of-the-art VOS methods. The code can be found at
https://github.com/Storife/RANet.Comment: Accepted by ICCV 2019. 10 pages, 7 figures, 6 tables. The
supplementary file can be found at
https://csjunxu.github.io/paper/2019ICCV/RANet_supp.pdf ; Code is available
at https://github.com/Storife/RANe
SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval
Hashing methods have been widely used for efficient similarity retrieval on
large scale image database. Traditional hashing methods learn hash functions to
generate binary codes from hand-crafted features, which achieve limited
accuracy since the hand-crafted features cannot optimally represent the image
content and preserve the semantic similarity. Recently, several deep hashing
methods have shown better performance because the deep architectures generate
more discriminative feature representations. However, these deep hashing
methods are mainly designed for supervised scenarios, which only exploit the
semantic similarity information, but ignore the underlying data structures. In
this paper, we propose the semi-supervised deep hashing (SSDH) approach, to
perform more effective hash function learning by simultaneously preserving
semantic similarity and underlying data structures. The main contributions are
as follows: (1) We propose a semi-supervised loss to jointly minimize the
empirical error on labeled data, as well as the embedding error on both labeled
and unlabeled data, which can preserve the semantic similarity and capture the
meaningful neighbors on the underlying data structures for effective hashing.
(2) A semi-supervised deep hashing network is designed to extensively exploit
both labeled and unlabeled data, in which we propose an online graph
construction method to benefit from the evolving deep features during training
to better capture semantic neighbors. To the best of our knowledge, the
proposed deep network is the first deep hashing method that can perform hash
code learning and feature learning simultaneously in a semi-supervised fashion.
Experimental results on 5 widely-used datasets show that our proposed approach
outperforms the state-of-the-art hashing methods.Comment: 14 pages, accepted by IEEE Transactions on Circuits and Systems for
Video Technolog
Rapid Probabilistic Interest Learning from Domain-Specific Pairwise Image Comparisons
A great deal of work aims to discover large general purpose models of image
interest or memorability for visual search and information retrieval. This
paper argues that image interest is often domain and user specific, and that
efficient mechanisms for learning about this domain-specific image interest as
quickly as possible, while limiting the amount of data-labelling required, are
often more useful to end-users. This work uses pairwise image comparisons to
reduce the labelling burden on these users, and introduces an image interest
estimation approach that performs similarly to recent data hungry deep learning
approaches trained using pairwise ranking losses. Here, we use a Gaussian
process model to interpolate image interest inferred using a Bayesian ranking
approach over image features extracted using a pre-trained convolutional neural
network. Results show that fitting a Gaussian process in high-dimensional image
feature space is not only computationally feasible, but also effective across a
broad range of domains. The proposed probabilistic interest estimation approach
produces image interests paired with uncertainties that can be used to identify
images for which additional labelling is required and measure inference
convergence, allowing for sample efficient active model training. Importantly,
the probabilistic formulation allows for effective visual search and
information retrieval when limited labelling data is available
An Uncertainty-Aware Approach for Exploratory Microblog Retrieval
Although there has been a great deal of interest in analyzing customer
opinions and breaking news in microblogs, progress has been hampered by the
lack of an effective mechanism to discover and retrieve data of interest from
microblogs. To address this problem, we have developed an uncertainty-aware
visual analytics approach to retrieve salient posts, users, and hashtags. We
extend an existing ranking technique to compute a multifaceted retrieval
result: the mutual reinforcement rank of a graph node, the uncertainty of each
rank, and the propagation of uncertainty among different graph nodes. To
illustrate the three facets, we have also designed a composite visualization
with three visual components: a graph visualization, an uncertainty glyph, and
a flow map. The graph visualization with glyphs, the flow map, and the
uncertainty analysis together enable analysts to effectively find the most
uncertain results and interactively refine them. We have applied our approach
to several Twitter datasets. Qualitative evaluation and two real-world case
studies demonstrate the promise of our approach for retrieving high-quality
microblog data
- …