19,983 research outputs found
Effective Image Retrieval via Multilinear Multi-index Fusion
Multi-index fusion has demonstrated impressive performances in retrieval task
by integrating different visual representations in a unified framework.
However, previous works mainly consider propagating similarities via neighbor
structure, ignoring the high order information among different visual
representations. In this paper, we propose a new multi-index fusion scheme for
image retrieval. By formulating this procedure as a multilinear based
optimization problem, the complementary information hidden in different indexes
can be explored more thoroughly. Specially, we first build our multiple indexes
from various visual representations. Then a so-called index-specific functional
matrix, which aims to propagate similarities, is introduced for updating the
original index. The functional matrices are then optimized in a unified tensor
space to achieve a refinement, such that the relevant images can be pushed more
closer. The optimization problem can be efficiently solved by the augmented
Lagrangian method with theoretical convergence guarantee. Unlike the
traditional multi-index fusion scheme, our approach embeds the multi-index
subspace structure into the new indexes with sparse constraint, thus it has
little additional memory consumption in online query stage. Experimental
evaluation on three benchmark datasets reveals that the proposed approach
achieves the state-of-the-art performance, i.e., N-score 3.94 on UKBench, mAP
94.1\% on Holiday and 62.39\% on Market-1501.Comment: 12 page
Multi-feature Fusion for Image Retrieval Using Constrained Dominant Sets
Aggregating different image features for image retrieval has recently shown
its effectiveness. While highly effective, though, the question of how to
uplift the impact of the best features for a specific query image persists as
an open computer vision problem. In this paper, we propose a computationally
efficient approach to fuse several hand-crafted and deep features, based on the
probabilistic distribution of a given membership score of a constrained cluster
in an unsupervised manner. First, we introduce an incremental nearest neighbor
(NN) selection method, whereby we dynamically select k-NN to the query. We then
build several graphs from the obtained NN sets and employ constrained dominant
sets (CDS) on each graph G to assign edge weights which consider the intrinsic
manifold structure of the graph, and detect false matches to the query.
Finally, we elaborate the computation of feature positive-impact weight (PIW)
based on the dispersive degree of the characteristics vector. To this end, we
exploit the entropy of a cluster membership-score distribution. In addition,
the final NN set bypasses a heuristic voting scheme. Experiments on several
retrieval benchmark datasets show that our method can improve the
state-of-the-art result
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
Indexing of CNN Features for Large Scale Image Search
The convolutional neural network (CNN) features can give a good description
of image content, which usually represent images with unique global vectors.
Although they are compact compared to local descriptors, they still cannot
efficiently deal with large-scale image retrieval due to the cost of the linear
incremental computation and storage. To address this issue, we build a simple
but effective indexing framework based on inverted table, which significantly
decreases both the search time and memory usage. In addition, several
strategies are fully investigated under an indexing framework to adapt it to
CNN features and compensate for quantization errors. First, we use multiple
assignment for the query and database images to increase the probability of
relevant images' co-existing in the same Voronoi cells obtained via the
clustering algorithm. Then, we introduce embedding codes to further improve
precision by removing false matches during a search. We demonstrate that by
using hashing schemes to calculate the embedding codes and by changing the
ranking rule, indexing framework speeds can be greatly improved. Extensive
experiments conducted on several unsupervised and supervised benchmarks support
these results and the superiority of the proposed indexing framework. We also
provide a fair comparison between the popular CNN features.Comment: 21 pages, 9 figures, submitted to Multimedia Tools and Application
Towards Storytelling from Visual Lifelogging: An Overview
Visual lifelogging consists of acquiring images that capture the daily
experiences of the user by wearing a camera over a long period of time. The
pictures taken offer considerable potential for knowledge mining concerning how
people live their lives, hence, they open up new opportunities for many
potential applications in fields including healthcare, security, leisure and
the quantified self. However, automatically building a story from a huge
collection of unstructured egocentric data presents major challenges. This
paper provides a thorough review of advances made so far in egocentric data
analysis, and in view of the current state of the art, indicates new lines of
research to move us towards storytelling from visual lifelogging.Comment: 16 pages, 11 figures, Submitted to IEEE Transactions on Human-Machine
System
Hashing with Mutual Information
Binary vector embeddings enable fast nearest neighbor retrieval in large
databases of high-dimensional objects, and play an important role in many
practical applications, such as image and video retrieval. We study the problem
of learning binary vector embeddings under a supervised setting, also known as
hashing. We propose a novel supervised hashing method based on optimizing an
information-theoretic quantity: mutual information. We show that optimizing
mutual information can reduce ambiguity in the induced neighborhood structure
in the learned Hamming space, which is essential in obtaining high retrieval
performance. To this end, we optimize mutual information in deep neural
networks with minibatch stochastic gradient descent, with a formulation that
maximally and efficiently utilizes available supervision. Experiments on four
image retrieval benchmarks, including ImageNet, confirm the effectiveness of
our method in learning high-quality binary embeddings for nearest neighbor
retrieval
Label-Specific Training Set Construction from Web Resource for Image Annotation
Recently many research efforts have been devoted to image annotation by
leveraging on the associated tags/keywords of web images as training labels. A
key issue to resolve is the relatively low accuracy of the tags. In this paper,
we propose a novel semi-automatic framework to construct a more accurate and
effective training set from these web media resources for each label that we
want to learn. Experiments conducted on a real-world dataset demonstrate that
the constructed training set can result in higher accuracy for image
annotation.Comment: 4 pages, 5 figure
Ego-Surfing: Person Localization in First-Person Videos Using Ego-Motion Signatures
We envision a future time when wearable cameras are worn by the masses and
recording first-person point-of-view videos of everyday life. While these
cameras can enable new assistive technologies and novel research challenges,
they also raise serious privacy concerns. For example, first-person videos
passively recorded by wearable cameras will necessarily include anyone who
comes into the view of a camera -- with or without consent. Motivated by these
benefits and risks, we developed a self-search technique tailored to
first-person videos. The key observation of our work is that the egocentric
head motion of a target person (ie, the self) is observed both in the
point-of-view video of the target and observer. The motion correlation between
the target person's video and the observer's video can then be used to identify
instances of the self uniquely. We incorporate this feature into the proposed
approach that computes the motion correlation over densely-sampled trajectories
to search for a target individual in observer videos. Our approach
significantly improves self-search performance over several well-known face
detectors and recognizers. Furthermore, we show how our approach can enable
several practical applications such as privacy filtering, target video
retrieval, and social group clustering.Comment: To appear in IEEE TPAM
Visual Concept Detection and Real Time Object Detection
Bag-of-words model is implemented and tried on 10-class visual concept
detection problem. The experimental results show that "DURF+ERT+SVM"
outperforms "SIFT+ERT+SVM" both in detection performance and computation
efficiency. Besides, combining DURF and SIFT results in even better detection
performance. Real-time object detection using SIFT and RANSAC is also tried on
simple objects, e.g. drink can, and good result is achieved
- …