15,854 research outputs found
Recent Advance in Content-based Image Retrieval: A Literature Survey
The explosive increase and ubiquitous accessibility of visual data on the Web
have led to the prosperity of research activity in image search or retrieval.
With the ignorance of visual content as a ranking clue, methods with text
search techniques for visual retrieval may suffer inconsistency between the
text words and visual content. Content-based image retrieval (CBIR), which
makes use of the representation of visual content to identify relevant images,
has attracted sustained attention in recent two decades. Such a problem is
challenging due to the intention gap and the semantic gap problems. Numerous
techniques have been developed for content-based image retrieval in the last
decade. The purpose of this paper is to categorize and evaluate those
algorithms proposed during the period of 2003 to 2016. We conclude with several
promising directions for future research.Comment: 22 page
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
Image Captioning based on Deep Learning Methods: A Survey
Image captioning is a challenging task and attracting more and more attention
in the field of Artificial Intelligence, and which can be applied to efficient
image retrieval, intelligent blind guidance and human-computer interaction,
etc. In this paper, we present a survey on advances in image captioning based
on Deep Learning methods, including Encoder-Decoder structure, improved methods
in Encoder, improved methods in Decoder, and other improvements. Furthermore,
we discussed future research directions
Review of Visual Saliency Detection with Comprehensive Information
Visual saliency detection model simulates the human visual system to perceive
the scene, and has been widely used in many vision tasks. With the acquisition
technology development, more comprehensive information, such as depth cue,
inter-image correspondence, or temporal relationship, is available to extend
image saliency detection to RGBD saliency detection, co-saliency detection, or
video saliency detection. RGBD saliency detection model focuses on extracting
the salient regions from RGBD images by combining the depth information.
Co-saliency detection model introduces the inter-image correspondence
constraint to discover the common salient object in an image group. The goal of
video saliency detection model is to locate the motion-related salient object
in video sequences, which considers the motion cue and spatiotemporal
constraint jointly. In this paper, we review different types of saliency
detection algorithms, summarize the important issues of the existing methods,
and discuss the existent problems and future works. Moreover, the evaluation
datasets and quantitative measurements are briefly introduced, and the
experimental analysis and discission are conducted to provide a holistic
overview of different saliency detection methods.Comment: 18 pages, 11 figures, 7 tables, Accepted by IEEE Transactions on
Circuits and Systems for Video Technology 2018, https://rmcong.github.io
Self-Supervised Visual Place Recognition Learning in Mobile Robots
Place recognition is a critical component in robot navigation that enables it
to re-establish previously visited locations, and simultaneously use this
information to correct the drift incurred in its dead-reckoned estimate. In
this work, we develop a self-supervised approach to place recognition in
robots. The task of visual loop-closure identification is cast as a metric
learning problem, where the labels for positive and negative examples of
loop-closures can be bootstrapped using a GPS-aided navigation solution that
the robot already uses. By leveraging the synchronization between sensors, we
show that we are able to learn an appropriate distance metric for arbitrary
real-valued image descriptors (including state-of-the-art CNN models), that is
specifically geared for visual place recognition in mobile robots. Furthermore,
we show that the newly learned embedding can be particularly powerful in
disambiguating visual scenes for the task of vision-based loop-closure
identification in mobile robots.Comment: Presented at Learning for Localization and Mapping Workshop at IROS
201
Visual Relationship Detection using Scene Graphs: A Survey
Understanding a scene by decoding the visual relationships depicted in an
image has been a long studied problem. While the recent advances in deep
learning and the usage of deep neural networks have achieved near human
accuracy on many tasks, there still exists a pretty big gap between human and
machine level performance when it comes to various visual relationship
detection tasks. Developing on earlier tasks like object recognition,
segmentation and captioning which focused on a relatively coarser image
understanding, newer tasks have been introduced recently to deal with a finer
level of image understanding. A Scene Graph is one such technique to better
represent a scene and the various relationships present in it. With its wide
number of applications in various tasks like Visual Question Answering,
Semantic Image Retrieval, Image Generation, among many others, it has proved to
be a useful tool for deeper and better visual relationship understanding. In
this paper, we present a detailed survey on the various techniques for scene
graph generation, their efficacy to represent visual relationships and how it
has been used to solve various downstream tasks. We also attempt to analyze the
various future directions in which the field might advance in the future. Being
one of the first papers to give a detailed survey on this topic, we also hope
to give a succinct introduction to scene graphs, and guide practitioners while
developing approaches for their applications
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
Visualizing Natural Language Descriptions: A Survey
A natural language interface exploits the conceptual simplicity and
naturalness of the language to create a high-level user-friendly communication
channel between humans and machines. One of the promising applications of such
interfaces is generating visual interpretations of semantic content of a given
natural language that can be then visualized either as a static scene or a
dynamic animation. This survey discusses requirements and challenges of
developing such systems and reports 26 graphical systems that exploit natural
language interfaces and addresses both artificial intelligence and
visualization aspects. This work serves as a frame of reference to researchers
and to enable further advances in the field.Comment: Due to copyright most of the figures only appear in the journal
versio
A Survey on Content-Aware Video Analysis for Sports
Sports data analysis is becoming increasingly large-scale, diversified, and
shared, but difficulty persists in rapidly accessing the most crucial
information. Previous surveys have focused on the methodologies of sports video
analysis from the spatiotemporal viewpoint instead of a content-based
viewpoint, and few of these studies have considered semantics. This study
develops a deeper interpretation of content-aware sports video analysis by
examining the insight offered by research into the structure of content under
different scenarios. On the basis of this insight, we provide an overview of
the themes particularly relevant to the research on content-aware systems for
broadcast sports. Specifically, we focus on the video content analysis
techniques applied in sportscasts over the past decade from the perspectives of
fundamentals and general review, a content hierarchical model, and trends and
challenges. Content-aware analysis methods are discussed with respect to
object-, event-, and context-oriented groups. In each group, the gap between
sensation and content excitement must be bridged using proper strategies. In
this regard, a content-aware approach is required to determine user demands.
Finally, the paper summarizes the future trends and challenges for sports video
analysis. We believe that our findings can advance the field of research on
content-aware video analysis for broadcast sports.Comment: Accepted for publication in IEEE Transactions on Circuits and Systems
for Video Technology (TCSVT
Automatic video scene segmentation based on spatial-temporal clues and rhythm
With ever increasing computing power and data storage capacity, the potential
for large digital video libraries is growing rapidly.However, the massive use
of video for the moment is limited by its opaque characteristics. Indeed, a
user who has to handle and retrieve sequentially needs too much time in order
to find out segments of interest within a video. Therefore, providing an
environment both convenient and efficient for video storing and retrieval,
especially for content-based searching as this exists in traditional textbased
database systems, has been the focus of recent and important efforts of a large
research community
In this paper, we propose a new automatic video scene segmentation method
that explores two main video features; these are spatial-temporal relationship
and rhythm of shots. The experimental evidence we obtained from a 80
minutevideo showed that our prototype provides very high accuracy for video
segmentation.Comment: 25 pages, 12 figure
- …