Search CORE

14,929 research outputs found

Geometry-Aware Recurrent Neural Networks for Active Visual Recognition

Author: Cheng Ricson
Fragkiadaki Katerina
Wang Ziyan
Publication venue
Publication date: 13/11/2018
Field of study

We present recurrent geometry-aware neural networks that integrate visual information across multiple views of a scene into 3D latent feature tensors, while maintaining an one-to-one mapping between 3D physical locations in the world scene and latent feature locations. Object detection, object segmentation, and 3D reconstruction is then carried out directly using the constructed 3D feature memory, as opposed to any of the input 2D images. The proposed models are equipped with differentiable egomotion-aware feature warping and (learned) depth-aware unprojection operations to achieve geometrically consistent mapping between the features in the input frame and the constructed latent model of the scene. We empirically show the proposed model generalizes much better than geometryunaware LSTM/GRU networks, especially under the presence of multiple objects and cross-object occlusions. Combined with active view selection policies, our model learns to select informative viewpoints to integrate information from by "undoing" cross-object occlusions, seamlessly combining geometry with learning from experience.Comment: To appear in NIPS201

arXiv.org e-Print Archive

A Survey on Deep Learning Methods for Robot Vision

Author: Loncomilla Patricio
Ruiz-del-Solar Javier
Soto Naiomi
Publication venue
Publication date: 28/03/2018
Field of study

Deep learning has allowed a paradigm shift in pattern recognition, from using hand-crafted features together with statistical classifiers to using general-purpose learning procedures for learning data-driven representations, features, and classifiers together. The application of this new paradigm has been particularly successful in computer vision, in which the development of deep learning methods for vision applications has become a hot research topic. Given that deep learning has already attracted the attention of the robot vision community, the main purpose of this survey is to address the use of deep learning in robot vision. To achieve this, a comprehensive overview of deep learning and its usage in computer vision is given, that includes a description of the most frequently used neural models and their main application areas. Then, the standard methodology and tools used for designing deep-learning based vision systems are presented. Afterwards, a review of the principal work using deep learning in robot vision is presented, as well as current and future trends related to the use of deep learning in robotics. This survey is intended to be a guide for the developers of robot vision systems

arXiv.org e-Print Archive

New region force for variational models in image segmentation and high dimensional data clustering

Author: Chan Tony F.
Tai Xue-Cheng
Wei Ke
Yin Ke
Publication venue
Publication date: 26/04/2017
Field of study

We propose an effective framework for multi-phase image segmentation and semi-supervised data clustering by introducing a novel region force term into the Potts model. Assume the probability that a pixel or a data point belongs to each class is known a priori. We show that the corresponding indicator function obeys the Bernoulli distribution and the new region force function can be computed as the negative log-likelihood function under the Bernoulli distribution. We solve the Potts model by the primal-dual hybrid gradient method and the augmented Lagrangian method, which are based on two different dual problems of the same primal problem. Empirical evaluations of the Potts model with the new region force function on benchmark problems show that it is competitive with existing variational methods in both image segmentation and semi-supervised data clustering

arXiv.org e-Print Archive

Convex variational methods for multiclass data segmentation on graphs

Author: Bae Egil
Merkurjev Ekaterina
Publication venue
Publication date: 16/02/2017
Field of study

Graph-based variational methods have recently shown to be highly competitive for various classification problems of high-dimensional data, but are inherently difficult to handle from an optimization perspective. This paper proposes a convex relaxation for a certain set of graph-based multiclass data segmentation problems, featuring region homogeneity terms, supervised information and/or certain constraints or penalty terms acting on the class sizes. Particular applications include semi-supervised classification of high-dimensional data and unsupervised segmentation of unstructured 3D point clouds. Theoretical analysis indicates that the convex relaxation closely approximates the original NP-hard problems, and these observations are also confirmed experimentally. An efficient duality based algorithm is developed that handles all constraints on the labeling function implicitly. Experiments on semi-supervised classification indicate consistently higher accuracies than related local minimization approaches, and considerably so when the training data are not uniformly distributed among the data set. The accuracies are also highly competitive against a wide range of other established methods on three benchmark datasets. Experiments on 3D point clouds acquired by a LaDAR in outdoor scenes, demonstrate that the scenes can accurately be segmented into object classes such as vegetation, the ground plane and human-made structures

arXiv.org e-Print Archive

Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation

Author: Cai Jianfei
Lu Shijian
Meng Fanman
Zhu Hongyuan
Publication venue
Publication date: 02/02/2015
Field of study

Image segmentation refers to the process to divide an image into nonoverlapping meaningful regions according to human perception, which has become a classic topic since the early ages of computer vision. A lot of research has been conducted and has resulted in many applications. However, while many segmentation algorithms exist, yet there are only a few sparse and outdated summarizations available, an overview of the recent achievements and issues is lacking. We aim to provide a comprehensive review of the recent progress in this field. Covering 180 publications, we give an overview of broad areas of segmentation topics including not only the classic bottom-up approaches, but also the recent development in superpixel, interactive methods, object proposals, semantic image parsing and image cosegmentation. In addition, we also review the existing influential datasets and evaluation metrics. Finally, we suggest some design flavors and research directions for future research in image segmentation.Comment: submitted to Elsevier Journal of Visual Communications and Image Representatio

arXiv.org e-Print Archive

Deep Q Learning Driven CT Pancreas Segmentation with Geometry-Aware U-Net

Author: Feng Junyi
Huang Yangsibo
Li Xi
Man Yunze
Wu Fei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/04/2019
Field of study

Segmentation of pancreas is important for medical image analysis, yet it faces great challenges of class imbalance, background distractions and non-rigid geometrical features. To address these difficulties, we introduce a Deep Q Network(DQN) driven approach with deformable U-Net to accurately segment the pancreas by explicitly interacting with contextual information and extract anisotropic features from pancreas. The DQN based model learns a context-adaptive localization policy to produce a visually tightened and precise localization bounding box of the pancreas. Furthermore, deformable U-Net captures geometry-aware information of pancreas by learning geometrically deformable filters for feature extraction. Experiments on NIH dataset validate the effectiveness of the proposed framework in pancreas segmentation.Comment: in IEEE Transactions on Medical Imaging (2019

arXiv.org e-Print Archive

Static Visual Spatial Priors for DoA Estimation

Author: Miksik Ondrej
Swietojanski Pawel
Publication venue
Publication date: 30/03/2019
Field of study

As we interact with the world, for example when we communicate with our colleagues in a large open space or meeting room, we continuously analyse the surrounding environment and, in particular, localise and recognise acoustic events. While we largely take such abilities for granted, they represent a challenging problem for current robots or smart voice assistants as they can be easily fooled by high degree of sound interference in acoustically complex environments. Preventing such failures when using solely audio data is challenging, if not impossible since the algorithms need to take into account wider context and often understand the scene on a semantic level. In this paper, we propose what to our knowledge is the first multi-modal direction of arrival (DoA) of sound, which uses static visual spatial prior providing an auxiliary information about the environment to suppress some of the false DoA detections. We validate our approach on a newly collected real-world dataset, and show that our approach consistently improves over classic DoA baselinesComment: 6 pages, 6 figures, 3 table

arXiv.org e-Print Archive

A Survey on Object Detection in Optical Remote Sensing Images

Author: Cheng Gong
Han Junwei
Publication venue: 'Elsevier BV'
Publication date: 22/03/2016
Field of study

Object detection in optical remote sensing images, being a fundamental but challenging problem in the field of aerial and satellite image analysis, plays an important role for a wide range of applications and is receiving significant attention in recent years. While enormous methods exist, a deep review of the literature concerning generic object detection is still lacking. This paper aims to provide a review of the recent progress in this field. Different from several previously published surveys that focus on a specific object class such as building and road, we concentrate on more generic object categories including, but are not limited to, road, building, tree, vehicle, ship, airport, urban-area. Covering about 270 publications we survey 1) template matching-based object detection methods, 2) knowledge-based object detection methods, 3) object-based image analysis (OBIA)-based object detection methods, 4) machine learning-based object detection methods, and 5) five publicly available datasets and three standard evaluation metrics. We also discuss the challenges of current studies and propose two promising research directions, namely deep learning-based feature representation and weakly supervised learning-based geospatial object detection. It is our hope that this survey will be beneficial for the researchers to have better understanding of this research field.Comment: This manuscript is the accepted version for ISPRS Journal of Photogrammetry and Remote Sensin

arXiv.org e-Print Archive

Iris Recognition Based on LBP and Combined LVQ Classifier

Author: El-Awady R. M.
Nomir O.
Rashad M. Z.
Shams M. Y.
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 07/11/2011
Field of study

Iris recognition is considered as one of the best biometric methods used for human identification and verification, this is because of its unique features that differ from one person to another, and its importance in the security field. This paper proposes an algorithm for iris recognition and classification using a system based on Local Binary Pattern and histogram properties as a statistical approaches for feature extraction, and Combined Learning Vector Quantization Classifier as Neural Network approach for classification, in order to build a hybrid model depends on both features. The localization and segmentation techniques are presented using both Canny edge detection and Hough Circular Transform in order to isolate an iris from the whole eye image and for noise detection .Feature vectors results from LBP is applied to a Combined LVQ classifier with different classes to determine the minimum acceptable performance, and the result is based on majority voting among several LVQ classifier. Different iris datasets CASIA, MMU1, MMU2, and LEI with different extensions and size are presented. Since LBP is working on a grayscale level so colored iris images should be transformed into a grayscale level. The proposed system gives a high recognition rate 99.87 % on different iris datasets compared with other methods.Comment: 12 Pages, 12 Figure

arXiv.org e-Print Archive

Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications

Author: Cohn Jeffrey F.
Corneanu Ciprian
Escalera Sergio
Oliu Marc
Publication venue
Publication date: 10/06/2016
Field of study

Facial expressions are an important way through which humans interact socially. Building a system capable of automatically recognizing facial expressions from images and video has been an intense field of study in recent years. Interpreting such expressions remains challenging and much research is needed about the way they relate to human affect. This paper presents a general overview of automatic RGB, 3D, thermal and multimodal facial expression analysis. We define a new taxonomy for the field, encompassing all steps from face detection to facial expression recognition, and describe and classify the state of the art methods accordingly. We also present the important datasets and the bench-marking of most influential methods. We conclude with a general discussion about trends, important questions and future lines of research

arXiv.org e-Print Archive