2,451 research outputs found
Learning to Extract Motion from Videos in Convolutional Neural Networks
This paper shows how to extract dense optical flow from videos with a
convolutional neural network (CNN). The proposed model constitutes a potential
building block for deeper architectures to allow using motion without resorting
to an external algorithm, \eg for recognition in videos. We derive our network
architecture from signal processing principles to provide desired invariances
to image contrast, phase and texture. We constrain weights within the network
to enforce strict rotation invariance and substantially reduce the number of
parameters to learn. We demonstrate end-to-end training on only 8 sequences of
the Middlebury dataset, orders of magnitude less than competing CNN-based
motion estimation methods, and obtain comparable performance to classical
methods on the Middlebury benchmark. Importantly, our method outputs a
distributed representation of motion that allows representing multiple,
transparent motions, and dynamic textures. Our contributions on network design
and rotation invariance offer insights nonspecific to motion estimation
Learning and Searching Methods for Robust, Real-Time Visual Odometry.
Accurate position estimation provides a critical foundation for mobile robot perception and control. While well-studied, it remains difficult to provide timely, precise, and robust position estimates for applications that operate in uncontrolled environments, such as robotic exploration and autonomous driving. Continuous, high-rate egomotion estimation is possible using cameras and Visual Odometry (VO), which tracks the movement of sparse scene content known as image keypoints or features. However, high update rates, often 30~Hz or greater, leave little computation time per frame, while variability in scene content stresses robustness. Due to these challenges, implementing an accurate and robust visual odometry system remains difficult.
This thesis investigates fundamental improvements throughout all stages of a visual odometry system, and has three primary contributions: The first contribution is a machine learning method for feature detector design. This method considers end-to-end motion estimation accuracy during learning. Consequently, accuracy and robustness are improved across multiple challenging datasets in comparison to state of the art alternatives. The second contribution is a proposed feature descriptor, TailoredBRIEF, that builds upon recent advances in the field in fast, low-memory descriptor extraction and matching. TailoredBRIEF is an in-situ descriptor learning method that improves feature matching accuracy by efficiently customizing descriptor structures on a per-feature basis. Further, a common asymmetry in vision system design between reference and query images is described and exploited, enabling approaches that would otherwise exceed runtime constraints. The final contribution is a new algorithm for visual motion estimation: Perspective Alignment Search~(PAS). Many vision systems depend on the unique appearance of features during matching, despite a large quantity of non-unique features in otherwise barren environments. A search-based method, PAS, is proposed to employ features that lack unique appearance through descriptorless matching. This method simplifies visual odometry pipelines, defining one method that subsumes feature matching, outlier rejection, and motion estimation.
Throughout this work, evaluations of the proposed methods and systems are carried out on ground-truth datasets, often generated with custom experimental platforms in challenging environments. Particular focus is placed on preserving runtimes compatible with real-time operation, as is necessary for deployment in the field.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113365/1/chardson_1.pd
Photometric redshift estimation via deep learning
The need to analyze the available large synoptic multi-band surveys drives
the development of new data-analysis methods. Photometric redshift estimation
is one field of application where such new methods improved the results,
substantially. Up to now, the vast majority of applied redshift estimation
methods have utilized photometric features. We aim to develop a method to
derive probabilistic photometric redshift directly from multi-band imaging
data, rendering pre-classification of objects and feature extraction obsolete.
A modified version of a deep convolutional network was combined with a mixture
density network. The estimates are expressed as Gaussian mixture models
representing the probability density functions (PDFs) in the redshift space. In
addition to the traditional scores, the continuous ranked probability score
(CRPS) and the probability integral transform (PIT) were applied as performance
criteria. We have adopted a feature based random forest and a plain mixture
density network to compare performances on experiments with data from SDSS
(DR9). We show that the proposed method is able to predict redshift PDFs
independently from the type of source, for example galaxies, quasars or stars.
Thereby the prediction performance is better than both presented reference
methods and is comparable to results from the literature. The presented method
is extremely general and allows us to solve of any kind of probabilistic
regression problems based on imaging data, for example estimating metallicity
or star formation rate of galaxies. This kind of methodology is tremendously
important for the next generation of surveys.Comment: 16 pages, 12 figures, 6 tables. Accepted for publication on A&
Unconstrained Face Verification using Deep CNN Features
In this paper, we present an algorithm for unconstrained face verification
based on deep convolutional features and evaluate it on the newly released
IARPA Janus Benchmark A (IJB-A) dataset. The IJB-A dataset includes real-world
unconstrained faces from 500 subjects with full pose and illumination
variations which are much harder than the traditional Labeled Face in the Wild
(LFW) and Youtube Face (YTF) datasets. The deep convolutional neural network
(DCNN) is trained using the CASIA-WebFace dataset. Extensive experiments on the
IJB-A dataset are provided
- …