2,451 research outputs found

    Learning to Extract Motion from Videos in Convolutional Neural Networks

    Full text link
    This paper shows how to extract dense optical flow from videos with a convolutional neural network (CNN). The proposed model constitutes a potential building block for deeper architectures to allow using motion without resorting to an external algorithm, \eg for recognition in videos. We derive our network architecture from signal processing principles to provide desired invariances to image contrast, phase and texture. We constrain weights within the network to enforce strict rotation invariance and substantially reduce the number of parameters to learn. We demonstrate end-to-end training on only 8 sequences of the Middlebury dataset, orders of magnitude less than competing CNN-based motion estimation methods, and obtain comparable performance to classical methods on the Middlebury benchmark. Importantly, our method outputs a distributed representation of motion that allows representing multiple, transparent motions, and dynamic textures. Our contributions on network design and rotation invariance offer insights nonspecific to motion estimation

    Learning and Searching Methods for Robust, Real-Time Visual Odometry.

    Full text link
    Accurate position estimation provides a critical foundation for mobile robot perception and control. While well-studied, it remains difficult to provide timely, precise, and robust position estimates for applications that operate in uncontrolled environments, such as robotic exploration and autonomous driving. Continuous, high-rate egomotion estimation is possible using cameras and Visual Odometry (VO), which tracks the movement of sparse scene content known as image keypoints or features. However, high update rates, often 30~Hz or greater, leave little computation time per frame, while variability in scene content stresses robustness. Due to these challenges, implementing an accurate and robust visual odometry system remains difficult. This thesis investigates fundamental improvements throughout all stages of a visual odometry system, and has three primary contributions: The first contribution is a machine learning method for feature detector design. This method considers end-to-end motion estimation accuracy during learning. Consequently, accuracy and robustness are improved across multiple challenging datasets in comparison to state of the art alternatives. The second contribution is a proposed feature descriptor, TailoredBRIEF, that builds upon recent advances in the field in fast, low-memory descriptor extraction and matching. TailoredBRIEF is an in-situ descriptor learning method that improves feature matching accuracy by efficiently customizing descriptor structures on a per-feature basis. Further, a common asymmetry in vision system design between reference and query images is described and exploited, enabling approaches that would otherwise exceed runtime constraints. The final contribution is a new algorithm for visual motion estimation: Perspective Alignment Search~(PAS). Many vision systems depend on the unique appearance of features during matching, despite a large quantity of non-unique features in otherwise barren environments. A search-based method, PAS, is proposed to employ features that lack unique appearance through descriptorless matching. This method simplifies visual odometry pipelines, defining one method that subsumes feature matching, outlier rejection, and motion estimation. Throughout this work, evaluations of the proposed methods and systems are carried out on ground-truth datasets, often generated with custom experimental platforms in challenging environments. Particular focus is placed on preserving runtimes compatible with real-time operation, as is necessary for deployment in the field.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113365/1/chardson_1.pd

    Photometric redshift estimation via deep learning

    Full text link
    The need to analyze the available large synoptic multi-band surveys drives the development of new data-analysis methods. Photometric redshift estimation is one field of application where such new methods improved the results, substantially. Up to now, the vast majority of applied redshift estimation methods have utilized photometric features. We aim to develop a method to derive probabilistic photometric redshift directly from multi-band imaging data, rendering pre-classification of objects and feature extraction obsolete. A modified version of a deep convolutional network was combined with a mixture density network. The estimates are expressed as Gaussian mixture models representing the probability density functions (PDFs) in the redshift space. In addition to the traditional scores, the continuous ranked probability score (CRPS) and the probability integral transform (PIT) were applied as performance criteria. We have adopted a feature based random forest and a plain mixture density network to compare performances on experiments with data from SDSS (DR9). We show that the proposed method is able to predict redshift PDFs independently from the type of source, for example galaxies, quasars or stars. Thereby the prediction performance is better than both presented reference methods and is comparable to results from the literature. The presented method is extremely general and allows us to solve of any kind of probabilistic regression problems based on imaging data, for example estimating metallicity or star formation rate of galaxies. This kind of methodology is tremendously important for the next generation of surveys.Comment: 16 pages, 12 figures, 6 tables. Accepted for publication on A&

    Unconstrained Face Verification using Deep CNN Features

    Full text link
    In this paper, we present an algorithm for unconstrained face verification based on deep convolutional features and evaluate it on the newly released IARPA Janus Benchmark A (IJB-A) dataset. The IJB-A dataset includes real-world unconstrained faces from 500 subjects with full pose and illumination variations which are much harder than the traditional Labeled Face in the Wild (LFW) and Youtube Face (YTF) datasets. The deep convolutional neural network (DCNN) is trained using the CASIA-WebFace dataset. Extensive experiments on the IJB-A dataset are provided
    • …
    corecore