24,384 research outputs found

    Reversible Recursive Instance-level Object Segmentation

    Full text link
    In this work, we propose a novel Reversible Recursive Instance-level Object Segmentation (R2-IOS) framework to address the challenging instance-level object segmentation task. R2-IOS consists of a reversible proposal refinement sub-network that predicts bounding box offsets for refining the object proposal locations, and an instance-level segmentation sub-network that generates the foreground mask of the dominant object instance in each proposal. By being recursive, R2-IOS iteratively optimizes the two sub-networks during joint training, in which the refined object proposals and improved segmentation predictions are alternately fed into each other to progressively increase the network capabilities. By being reversible, the proposal refinement sub-network adaptively determines an optimal number of refinement iterations required for each proposal during both training and testing. Furthermore, to handle multiple overlapped instances within a proposal, an instance-aware denoising autoencoder is introduced into the segmentation sub-network to distinguish the dominant object from other distracting instances. Extensive experiments on the challenging PASCAL VOC 2012 benchmark well demonstrate the superiority of R2-IOS over other state-of-the-art methods. In particular, the APr\text{AP}^r over 2020 classes at 0.50.5 IoU achieves 66.7%66.7\%, which significantly outperforms the results of 58.7%58.7\% by PFN~\cite{PFN} and 46.3%46.3\% by~\cite{liu2015multi}.Comment: 9 page

    Coupled Depth Learning

    Full text link
    In this paper we propose a method for estimating depth from a single image using a coarse to fine approach. We argue that modeling the fine depth details is easier after a coarse depth map has been computed. We express a global (coarse) depth map of an image as a linear combination of a depth basis learned from training examples. The depth basis captures spatial and statistical regularities and reduces the problem of global depth estimation to the task of predicting the input-specific coefficients in the linear combination. This is formulated as a regression problem from a holistic representation of the image. Crucially, the depth basis and the regression function are {\bf coupled} and jointly optimized by our learning scheme. We demonstrate that this results in a significant improvement in accuracy compared to direct regression of depth pixel values or approaches learning the depth basis disjointly from the regression function. The global depth estimate is then used as a guidance by a local refinement method that introduces depth details that were not captured at the global level. Experiments on the NYUv2 and KITTI datasets show that our method outperforms the existing state-of-the-art at a considerably lower computational cost for both training and testing.Comment: 10 pages, 3 Figures, 4 Tables with quantitative evaluation

    Scene Graph Generation with External Knowledge and Image Reconstruction

    Full text link
    Scene graph generation has received growing attention with the advancements in image understanding tasks such as object detection, attributes and relationship prediction,~\etc. However, existing datasets are biased in terms of object and relationship labels, or often come with noisy and missing annotations, which makes the development of a reliable scene graph prediction model very challenging. In this paper, we propose a novel scene graph generation algorithm with external knowledge and image reconstruction loss to overcome these dataset issues. In particular, we extract commonsense knowledge from the external knowledge base to refine object and phrase features for improving generalizability in scene graph generation. To address the bias of noisy object annotations, we introduce an auxiliary image reconstruction path to regularize the scene graph generation network. Extensive experiments show that our framework can generate better scene graphs, achieving the state-of-the-art performance on two benchmark datasets: Visual Relationship Detection and Visual Genome datasets.Comment: 10 pages, 5 figures, Accepted in CVPR 201

    Joint coarse-and-fine reasoning for deep optical flow

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.We propose a novel representation for dense pixel-wise estimation tasks using CNNs that boosts accuracy and reduces training time, by explicitly exploiting joint coarse-and-fine reasoning. The coarse reasoning is performed over a discrete classification space to obtain a general rough solution, while the fine details of the solution are obtained over a continuous regression space. In our approach both components are jointly estimated, which proved to be beneficial for improving estimation accuracy. Additionally, we propose a new network architecture, which combines coarse and fine components by treating the fine estimation as a refinement built on top of the coarse solution, and therefore adding details to the general prediction. We apply our approach to the challenging problem of optical flow estimation and empirically validate it against state-of-the-art CNN-based solutions trained from scratch and tested on large optical flow datasets.Peer ReviewedPostprint (author's final draft
    corecore