709 research outputs found

    Deep Autoencoder for Combined Human Pose Estimation and body Model Upscaling

    Get PDF
    We present a method for simultaneously estimating 3D human pose and body shape from a sparse set of wide-baseline camera views. We train a symmetric convolutional autoencoder with a dual loss that enforces learning of a latent representation that encodes skeletal joint positions, and at the same time learns a deep representation of volumetric body shape. We harness the latter to up-scale input volumetric data by a factor of 4×4 \times, whilst recovering a 3D estimate of joint positions with equal or greater accuracy than the state of the art. Inference runs in real-time (25 fps) and has the potential for passive human behaviour monitoring where there is a requirement for high fidelity estimation of human body shape and pose

    Light Field Super-Resolution Via Graph-Based Regularization

    Full text link
    Light field cameras capture the 3D information in a scene with a single exposure. This special feature makes light field cameras very appealing for a variety of applications: from post-capture refocus, to depth estimation and image-based rendering. However, light field cameras suffer by design from strong limitations in their spatial resolution, which should therefore be augmented by computational methods. On the one hand, off-the-shelf single-frame and multi-frame super-resolution algorithms are not ideal for light field data, as they do not consider its particular structure. On the other hand, the few super-resolution algorithms explicitly tailored for light field data exhibit significant limitations, such as the need to estimate an explicit disparity map at each view. In this work we propose a new light field super-resolution algorithm meant to address these limitations. We adopt a multi-frame alike super-resolution approach, where the complementary information in the different light field views is used to augment the spatial resolution of the whole light field. We show that coupling the multi-frame approach with a graph regularizer, that enforces the light field structure via nonlocal self similarities, permits to avoid the costly and challenging disparity estimation step for all the views. Extensive experiments show that the new algorithm compares favorably to the other state-of-the-art methods for light field super-resolution, both in terms of PSNR and visual quality.Comment: This new version includes more material. In particular, we added: a new section on the computational complexity of the proposed algorithm, experimental comparisons with a CNN-based super-resolution algorithm, and new experiments on a third datase

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Computational Methods for Matrix/Tensor Factorization and Deep Learning Image Denoising

    Get PDF
    Feature learning is a technique to automatically extract features from raw data. It is widely used in areas such as computer vision, image processing, data mining and natural language processing. In this thesis, we are interested in the computational aspects of feature learning. We focus on rank matrix and tensor factorization and deep neural network models for image denoising. With respect to matrix and tensor factorization, we first present a technique to speed up alternating least squares (ALS) and gradient descent (GD) − two commonly used strategies for tensor factorization. We introduce an efficient, scalable and distributed algorithm that addresses the data explosion problem. Instead of a computationally challenging sub-step of ALS and GD, we implement the algorithm on parallel machines by using only two sparse matrix-vector products. Not only is the algorithm scalable but it is also on average 4 to 10 times faster than competing algorithms on various data sets. Next, we discuss our results of non-negative matrix factorization for hyperspectral image data in the presence of noise. We introduce a spectral total variation regularization and derive four variants of the alternating direction method of multiplier algorithm. While all four methods belong to the same family of algorithms, some perform better than others. Thus, we compare the algorithms using stimulated Raman spectroscopic image will be demonstrated. For deep neural network models, we focus on its application to image denoising. We first demonstrate how an optimal procedure leveraging deep neural networks and convex optimization can combine a given set of denoisers to produce an overall better result. The proposed framework estimates the mean squared error (MSE) of individual denoised outputs using a deep neural network; optimally combines the denoised outputs via convex optimization; and recovers lost details of the combined images using another deep neural network. The framework consistently improves denoising performance for both deterministic denoisers and neural network denoisers. Next, we apply the deep neural network to solve the image reconstruction issues of the Quanta Image Sensor (QIS), which is a single-photon image sensor that oversamples the light field to generate binary measures

    AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching

    Full text link
    Despite significant progress of deep learning in recent years, state-of-the-art semantic matching methods still rely on legacy features such as SIFT or HoG. We argue that the strong invariance properties that are key to the success of recent deep architectures on the classification task make them unfit for dense correspondence tasks, unless a large amount of supervision is used. In this work, we propose a deep network, termed AnchorNet, that produces image representations that are well-suited for semantic matching. It relies on a set of filters whose response is geometrically consistent across different object instances, even in the presence of strong intra-class, scale, or viewpoint variations. Trained only with weak image-level labels, the final representation successfully captures information about the object structure and improves results of state-of-the-art semantic matching methods such as the deformable spatial pyramid or the proposal flow methods. We show positive results on the cross-instance matching task where different instances of the same object category are matched as well as on a new cross-category semantic matching task aligning pairs of instances each from a different object class.Comment: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 201
    • …
    corecore