19 research outputs found
Dense Piecewise Planar RGB-D SLAM for Indoor Environments
The paper exploits weak Manhattan constraints to parse the structure of
indoor environments from RGB-D video sequences in an online setting. We extend
the previous approach for single view parsing of indoor scenes to video
sequences and formulate the problem of recovering the floor plan of the
environment as an optimal labeling problem solved using dynamic programming.
The temporal continuity is enforced in a recursive setting, where labeling from
previous frames is used as a prior term in the objective function. In addition
to recovery of piecewise planar weak Manhattan structure of the extended
environment, the orthogonality constraints are also exploited by visual
odometry and pose graph optimization. This yields reliable estimates in the
presence of large motions and absence of distinctive features to track. We
evaluate our method on several challenging indoors sequences demonstrating
accurate SLAM and dense mapping of low texture environments. On existing TUM
benchmark we achieve competitive results with the alternative approaches which
fail in our environments.Comment: International Conference on Intelligent Robots and Systems (IROS)
201
Efficient Structured Prediction with Latent Variables for General Graphical Models
In this paper we propose a unified framework for structured prediction with
latent variables which includes hidden conditional random fields and latent
structured support vector machines as special cases. We describe a local
entropy approximation for this general formulation using duality, and derive an
efficient message passing algorithm that is guaranteed to converge. We
demonstrate its effectiveness in the tasks of image segmentation as well as 3D
indoor scene understanding from single images, showing that our approach is
superior to latent structured support vector machines and hidden conditional
random fields.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Floor-SP: Inverse CAD for Floorplans by Sequential Room-wise Shortest Path
This paper proposes a new approach for automated floorplan reconstruction
from RGBD scans, a major milestone in indoor mapping research. The approach,
dubbed Floor-SP, formulates a novel optimization problem, where room-wise
coordinate descent sequentially solves dynamic programming to optimize the
floorplan graph structure. The objective function consists of data terms guided
by deep neural networks, consistency terms encouraging adjacent rooms to share
corners and walls, and the model complexity term. The approach does not require
corner/edge detection with thresholds, unlike most other methods. We have
evaluated our system on production-quality RGBD scans of 527 apartments or
houses, including many units with non-Manhattan structures. Qualitative and
quantitative evaluations demonstrate a significant performance boost over the
current state-of-the-art. Please refer to our project website
http://jcchen.me/floor-sp/ for code and data.Comment: 10 pages, 9 figures, accepted to ICCV 201
Basic level scene understanding: categories, attributes and structures
A longstanding goal of computer vision is to build a system that can automatically understand a 3D scene from a single image. This requires extracting semantic concepts and 3D information from 2D images which can depict an enormous variety of environments that comprise our visual world. This paper summarizes our recent efforts toward these goals. First, we describe the richly annotated SUN database which is a collection of annotated images spanning 908 different scene categories with object, attribute, and geometric labels for many scenes. This database allows us to systematically study the space of scenes and to establish a benchmark for scene and object recognition. We augment the categorical SUN database with 102 scene attributes for every image and explore attribute recognition. Finally, we present an integrated system to extract the 3D structure of the scene and objects depicted in an image.Google U.S./Canada Ph.D. Fellowship in Computer VisionNational Science Foundation (U.S.) (grant 1016862)Google Faculty Research AwardNational Science Foundation (U.S.) (Career Award 1149853)National Science Foundation (U.S.) (Career Award 0747120)United States. Office of Naval Research. Multidisciplinary University Research Initiative (N000141010933
Blending Learning and Inference in Structured Prediction
In this paper we derive an efficient algorithm to learn the parameters of
structured predictors in general graphical models. This algorithm blends the
learning and inference tasks, which results in a significant speedup over
traditional approaches, such as conditional random fields and structured
support vector machines. For this purpose we utilize the structures of the
predictors to describe a low dimensional structured prediction task which
encourages local consistencies within the different structures while learning
the parameters of the model. Convexity of the learning task provides the means
to enforce the consistencies between the different parts. The
inference-learning blending algorithm that we propose is guaranteed to converge
to the optimum of the low dimensional primal and dual programs. Unlike many of
the existing approaches, the inference-learning blending allows us to learn
efficiently high-order graphical models, over regions of any size, and very
large number of parameters. We demonstrate the effectiveness of our approach,
while presenting state-of-the-art results in stereo estimation, semantic
segmentation, shape reconstruction, and indoor scene understanding
Estimación del layout 3D en interiores a partir de imágenes
En este trabajo se ha desarrollado un método de identificación de los bordes estructurales de habitaciones a partir de una única imagen. Han sido muchos los trabajos que han tratado de resolver este problema a lo largo de la última década. Estos métodos generalmente se basan en la generación de diferentes hipótesis de modelos de layouts a partir de razonamientos puramente geométricos, o bien mas recientemente, utilizando técnicas de aprendizaje profundo (para, o bien apoyar las hipótesis basadas en la geometrÃa o bien hacer hipótesis basadas solamente en el aprendizaje profundo). La principal limitación de realizar hipótesis que impliquen razonamientos geométricos es que en las imágenes con muchas oclusiones las direcciones principales pueden ser muy difÃciles de detectar, mientras que confiar las hipótesis solamente alas técnicas de aprendizaje profundo (Deep Learning) no es del todo eficaz, ya que el uso de estas para este fin todavÃa esta en fase de desarrollo y no tienen la eficacia deseada. Este trabajo tiene la principal novedad de combinar dos tipos de hipótesis,una basada en razonamientos geométricos de visión por computador y otra a partir únicamente de técnicas ’Deep Learning’ siendo capaces de detectar la mejor solución en cada caso. En este trabajo se muestran resultados de reconstrucción de layouts con imágenes de la base de datos pública LSUN (Large-scale Scene Understanding Challenge) usada por otros trabajos del estado del arte. Con ellos demostramos la efectividad del método con respecto a trabajos existentes, situándonos en nuestros primeros experimentos a la cabeza del estado del arte