74 research outputs found

    Terrain Classification from Body-mounted Cameras during Human Locomotion

    Get PDF
    Abstract—This paper presents a novel algorithm for terrain type classification based on monocular video captured from the viewpoint of human locomotion. A texture-based algorithm is developed to classify the path ahead into multiple groups that can be used to support terrain classification. Gait is taken into account in two ways. Firstly, for key frame selection, when regions with homogeneous texture characteristics are updated, the fre-quency variations of the textured surface are analysed and used to adaptively define filter coefficients. Secondly, it is incorporated in the parameter estimation process where probabilities of path consistency are employed to improve terrain-type estimation. When tested with multiple classes that directly affect mobility a hard surface, a soft surface and an unwalkable area- our proposed method outperforms existing methods by up to 16%, and also provides improved robustness. Index Terms—texture, classification, recursive filter, terrain classification I

    Efficient Methodologies for Single-Image Blind Deconvolution and Deblurring

    Get PDF

    Image Motion Analysis using Inertial Sensors

    Get PDF

    Foundations, Inference, and Deconvolution in Image Restoration

    Get PDF
    Image restoration is a critical preprocessing step in computer vision, producing images with reduced noise, blur, and pixel defects. This enables precise higher-level reasoning as to the scene content in later stages of the vision pipeline (e.g., object segmentation, detection, recognition, and tracking). Restoration techniques have found extensive usage in a broad range of applications from industry, medicine, astronomy, biology, and photography. The recovery of high-grade results requires models of the image degradation process, giving rise to a class of often heavily underconstrained, inverse problems. A further challenge specific to the problem of blur removal is noise amplification, which may cause strong distortion by ringing artifacts. This dissertation presents new insights and problem solving procedures for three areas of image restoration, namely (1) model foundations, (2) Bayesian inference for high-order Markov random fields (MRFs), and (3) blind image deblurring (deconvolution). As basic research on model foundations, we contribute to reconciling the perceived differences between probabilistic MRFs on the one hand, and deterministic variational models on the other. To do so, we restrict the variational functional to locally supported finite elements (FE) and integrate over the domain. This yields a sum of terms depending locally on FE basis coefficients, and by identifying the latter with pixels, the terms resolve to MRF potential functions. In contrast with previous literature, we place special emphasis on robust regularizers used commonly in contemporary computer vision. Moreover, we draw samples from the derived models to further demonstrate the probabilistic connection. Another focal issue is a class of high-order Field of Experts MRFs which are learned generatively from natural image data and yield best quantitative results under Bayesian estimation. This involves minimizing an integral expression, which has no closed form solution in general. However, the MRF class under study has Gaussian mixture potentials, permitting expansion by indicator variables as a technical measure. As approximate inference method, we study Gibbs sampling in the context of non-blind deblurring and obtain excellent results, yet at the cost of high computing effort. In reaction to this, we turn to the mean field algorithm, and show that it scales quadratically in the clique size for a standard restoration setting with linear degradation model. An empirical study of mean field over several restoration scenarios confirms advantageous properties with regard to both image quality and computational runtime. This dissertation further examines the problem of blind deconvolution, beginning with localized blur from fast moving objects in the scene, or from camera defocus. Forgoing dedicated hardware or user labels, we rely only on the image as input and introduce a latent variable model to explain the non-uniform blur. The inference procedure estimates freely varying kernels and we demonstrate its generality by extensive experiments. We further present a discriminative method for blind removal of camera shake. In particular, we interleave discriminative non-blind deconvolution steps with kernel estimation and leverage the error cancellation effects of the Regression Tree Field model to attain a deblurring process with tightly linked sequential stages

    Video Depth-From-Defocus

    Get PDF
    Many compelling video post-processing effects, in particular aesthetic focus editing and refocusing effects, are feasible if per-frame depth information is available. Existing computational methods to capture RGB and depth either purposefully modify the optics (coded aperture, light-field imaging), or employ active RGB-D cameras. Since these methods are less practical for users with normal cameras, we present an algorithm to capture all-in-focus RGB-D video of dynamic scenes with an unmodified commodity video camera. Our algorithm turns the often unwanted defocus blur into a valuable signal. The input to our method is a video in which the focus plane is continuously moving back and forth during capture, and thus defocus blur is provoked and strongly visible. This can be achieved by manually turning the focus ring of the lens during recording. The core algorithmic ingredient is a new video-based depth-from-defocus algorithm that computes space-time-coherent depth maps, deblurred all-in-focus video, and the focus distance for each frame. We extensively evaluate our approach, and show that it enables compelling video post-processing effects, such as different types of refocusing

    Learning From Multi-Frame Data

    Get PDF
    Multi-frame data-driven methods bear the promise that aggregating multiple observations leads to better estimates of target quantities than a single (still) observation. This thesis examines how data-driven approaches such as deep neural networks should be constructed to improve over single-frame-based counterparts. Besides algorithmic changes, as for example in the design of artificial neural network architectures or the algorithm itself, such an examination is inextricably linked with the consideration of the synthesis of synthetic training data in meaningful size (even if no annotations are available) and quality (if real ground-truth acquisition is not possible), which capture all temporal effects with high fidelity. We start with the introduction of a new algorithm to accelerate a nonparametric learning algorithm by using a GPU adapted implementation to search for the nearest neighbor. While the approaches known so far are clearly surpassed, this empirically reveals that the data generated can be managed within a reasonable time and that several inputs can be processed in parallel even under hardware restrictions. Based on a learning-based solution, we introduce a novel training protocol to bridge the need for carefully curated training data and demonstrate better performance and robustness than a non-parametric search for the nearest neighbor via temporal video alignments. Effective learning in the absence of labels is required when dealing with larger amounts of data that are easy to capture but not feasible or at least costly to label. In addition, we show new ways to generate plausible and realistic synthesized data and their inevitability when it comes to closing the gap to expensive and almost infeasible real-world acquisition. These eventually achieve state-of-the-art results in classical image processing tasks such as reflection removal and video deblurring
    • …
    corecore