14,929 research outputs found
Geometry-Aware Recurrent Neural Networks for Active Visual Recognition
We present recurrent geometry-aware neural networks that integrate visual
information across multiple views of a scene into 3D latent feature tensors,
while maintaining an one-to-one mapping between 3D physical locations in the
world scene and latent feature locations. Object detection, object
segmentation, and 3D reconstruction is then carried out directly using the
constructed 3D feature memory, as opposed to any of the input 2D images. The
proposed models are equipped with differentiable egomotion-aware feature
warping and (learned) depth-aware unprojection operations to achieve
geometrically consistent mapping between the features in the input frame and
the constructed latent model of the scene. We empirically show the proposed
model generalizes much better than geometryunaware LSTM/GRU networks,
especially under the presence of multiple objects and cross-object occlusions.
Combined with active view selection policies, our model learns to select
informative viewpoints to integrate information from by "undoing" cross-object
occlusions, seamlessly combining geometry with learning from experience.Comment: To appear in NIPS201
A Survey on Deep Learning Methods for Robot Vision
Deep learning has allowed a paradigm shift in pattern recognition, from using
hand-crafted features together with statistical classifiers to using
general-purpose learning procedures for learning data-driven representations,
features, and classifiers together. The application of this new paradigm has
been particularly successful in computer vision, in which the development of
deep learning methods for vision applications has become a hot research topic.
Given that deep learning has already attracted the attention of the robot
vision community, the main purpose of this survey is to address the use of deep
learning in robot vision. To achieve this, a comprehensive overview of deep
learning and its usage in computer vision is given, that includes a description
of the most frequently used neural models and their main application areas.
Then, the standard methodology and tools used for designing deep-learning based
vision systems are presented. Afterwards, a review of the principal work using
deep learning in robot vision is presented, as well as current and future
trends related to the use of deep learning in robotics. This survey is intended
to be a guide for the developers of robot vision systems
New region force for variational models in image segmentation and high dimensional data clustering
We propose an effective framework for multi-phase image segmentation and
semi-supervised data clustering by introducing a novel region force term into
the Potts model. Assume the probability that a pixel or a data point belongs to
each class is known a priori. We show that the corresponding indicator function
obeys the Bernoulli distribution and the new region force function can be
computed as the negative log-likelihood function under the Bernoulli
distribution. We solve the Potts model by the primal-dual hybrid gradient
method and the augmented Lagrangian method, which are based on two different
dual problems of the same primal problem. Empirical evaluations of the Potts
model with the new region force function on benchmark problems show that it is
competitive with existing variational methods in both image segmentation and
semi-supervised data clustering
Convex variational methods for multiclass data segmentation on graphs
Graph-based variational methods have recently shown to be highly competitive
for various classification problems of high-dimensional data, but are
inherently difficult to handle from an optimization perspective. This paper
proposes a convex relaxation for a certain set of graph-based multiclass data
segmentation problems, featuring region homogeneity terms, supervised
information and/or certain constraints or penalty terms acting on the class
sizes. Particular applications include semi-supervised classification of
high-dimensional data and unsupervised segmentation of unstructured 3D point
clouds. Theoretical analysis indicates that the convex relaxation closely
approximates the original NP-hard problems, and these observations are also
confirmed experimentally. An efficient duality based algorithm is developed
that handles all constraints on the labeling function implicitly. Experiments
on semi-supervised classification indicate consistently higher accuracies than
related local minimization approaches, and considerably so when the training
data are not uniformly distributed among the data set. The accuracies are also
highly competitive against a wide range of other established methods on three
benchmark datasets. Experiments on 3D point clouds acquired by a LaDAR in
outdoor scenes, demonstrate that the scenes can accurately be segmented into
object classes such as vegetation, the ground plane and human-made structures
Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation
Image segmentation refers to the process to divide an image into
nonoverlapping meaningful regions according to human perception, which has
become a classic topic since the early ages of computer vision. A lot of
research has been conducted and has resulted in many applications. However,
while many segmentation algorithms exist, yet there are only a few sparse and
outdated summarizations available, an overview of the recent achievements and
issues is lacking. We aim to provide a comprehensive review of the recent
progress in this field. Covering 180 publications, we give an overview of broad
areas of segmentation topics including not only the classic bottom-up
approaches, but also the recent development in superpixel, interactive methods,
object proposals, semantic image parsing and image cosegmentation. In addition,
we also review the existing influential datasets and evaluation metrics.
Finally, we suggest some design flavors and research directions for future
research in image segmentation.Comment: submitted to Elsevier Journal of Visual Communications and Image
Representatio
Deep Q Learning Driven CT Pancreas Segmentation with Geometry-Aware U-Net
Segmentation of pancreas is important for medical image analysis, yet it
faces great challenges of class imbalance, background distractions and
non-rigid geometrical features. To address these difficulties, we introduce a
Deep Q Network(DQN) driven approach with deformable U-Net to accurately segment
the pancreas by explicitly interacting with contextual information and extract
anisotropic features from pancreas. The DQN based model learns a
context-adaptive localization policy to produce a visually tightened and
precise localization bounding box of the pancreas. Furthermore, deformable
U-Net captures geometry-aware information of pancreas by learning geometrically
deformable filters for feature extraction. Experiments on NIH dataset validate
the effectiveness of the proposed framework in pancreas segmentation.Comment: in IEEE Transactions on Medical Imaging (2019
Static Visual Spatial Priors for DoA Estimation
As we interact with the world, for example when we communicate with our
colleagues in a large open space or meeting room, we continuously analyse the
surrounding environment and, in particular, localise and recognise acoustic
events. While we largely take such abilities for granted, they represent a
challenging problem for current robots or smart voice assistants as they can be
easily fooled by high degree of sound interference in acoustically complex
environments. Preventing such failures when using solely audio data is
challenging, if not impossible since the algorithms need to take into account
wider context and often understand the scene on a semantic level. In this
paper, we propose what to our knowledge is the first multi-modal direction of
arrival (DoA) of sound, which uses static visual spatial prior providing an
auxiliary information about the environment to suppress some of the false DoA
detections. We validate our approach on a newly collected real-world dataset,
and show that our approach consistently improves over classic DoA baselinesComment: 6 pages, 6 figures, 3 table
A Survey on Object Detection in Optical Remote Sensing Images
Object detection in optical remote sensing images, being a fundamental but
challenging problem in the field of aerial and satellite image analysis, plays
an important role for a wide range of applications and is receiving significant
attention in recent years. While enormous methods exist, a deep review of the
literature concerning generic object detection is still lacking. This paper
aims to provide a review of the recent progress in this field. Different from
several previously published surveys that focus on a specific object class such
as building and road, we concentrate on more generic object categories
including, but are not limited to, road, building, tree, vehicle, ship,
airport, urban-area. Covering about 270 publications we survey 1) template
matching-based object detection methods, 2) knowledge-based object detection
methods, 3) object-based image analysis (OBIA)-based object detection methods,
4) machine learning-based object detection methods, and 5) five publicly
available datasets and three standard evaluation metrics. We also discuss the
challenges of current studies and propose two promising research directions,
namely deep learning-based feature representation and weakly supervised
learning-based geospatial object detection. It is our hope that this survey
will be beneficial for the researchers to have better understanding of this
research field.Comment: This manuscript is the accepted version for ISPRS Journal of
Photogrammetry and Remote Sensin
Iris Recognition Based on LBP and Combined LVQ Classifier
Iris recognition is considered as one of the best biometric methods used for
human identification and verification, this is because of its unique features
that differ from one person to another, and its importance in the security
field. This paper proposes an algorithm for iris recognition and classification
using a system based on Local Binary Pattern and histogram properties as a
statistical approaches for feature extraction, and Combined Learning Vector
Quantization Classifier as Neural Network approach for classification, in order
to build a hybrid model depends on both features. The localization and
segmentation techniques are presented using both Canny edge detection and Hough
Circular Transform in order to isolate an iris from the whole eye image and for
noise detection .Feature vectors results from LBP is applied to a Combined LVQ
classifier with different classes to determine the minimum acceptable
performance, and the result is based on majority voting among several LVQ
classifier. Different iris datasets CASIA, MMU1, MMU2, and LEI with different
extensions and size are presented. Since LBP is working on a grayscale level so
colored iris images should be transformed into a grayscale level. The proposed
system gives a high recognition rate 99.87 % on different iris datasets
compared with other methods.Comment: 12 Pages, 12 Figure
Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications
Facial expressions are an important way through which humans interact
socially. Building a system capable of automatically recognizing facial
expressions from images and video has been an intense field of study in recent
years. Interpreting such expressions remains challenging and much research is
needed about the way they relate to human affect. This paper presents a general
overview of automatic RGB, 3D, thermal and multimodal facial expression
analysis. We define a new taxonomy for the field, encompassing all steps from
face detection to facial expression recognition, and describe and classify the
state of the art methods accordingly. We also present the important datasets
and the bench-marking of most influential methods. We conclude with a general
discussion about trends, important questions and future lines of research
- …