Search CORE

624 research outputs found

Extracting real estate values of rental apartment floor plans using graph convolutional networks

Author: Takizawa Atsushi
Publication venue
Publication date: 23/03/2023
Field of study

Access graphs that indicate adjacency relationships from the perspective of flow lines of rooms are extracted automatically from a large number of floor plan images of a family-oriented rental apartment complex in Osaka Prefecture, Japan, based on a recently proposed access graph extraction method with slight modifications. We define and implement a graph convolutional network (GCN) for access graphs and propose a model to estimate the real estate value of access graphs as the floor plan value. The model, which includes the floor plan value and hedonic method using other general explanatory variables, is used to estimate rents and their estimation accuracies are compared. In addition, the features of the floor plan that explain the rent are analyzed from the learned convolution network. Therefore, a new model for comprehensively estimating the value of real estate floor plans is proposed and validated. The results show that the proposed method significantly improves the accuracy of rent estimation compared to that of conventional models, and it is possible to understand the specific spatial configuration rules that influence the value of a floor plan by analyzing the learned GCN

arXiv.org e-Print Archive

Deep Learning based 3D Segmentation: A Survey

Author: Fu Qiang
He Yong
Liu Xiaoyan
Mian Ajmal
Sun Wei
Wang Yaonan
Yang Zhengeng
Yu Hongshan
Zou Yanmei
Publication venue
Publication date: 09/03/2021
Field of study

3D object segmentation is a fundamental and challenging problem in computer vision with applications in autonomous driving, robotics, augmented reality and medical image analysis. It has received significant attention from the computer vision, graphics and machine learning communities. Traditionally, 3D segmentation was performed with hand-crafted features and engineered methods which failed to achieve acceptable accuracy and could not generalize to large-scale data. Driven by their great success in 2D computer vision, deep learning techniques have recently become the tool of choice for 3D segmentation tasks as well. This has led to an influx of a large number of methods in the literature that have been evaluated on different benchmark datasets. This paper provides a comprehensive survey of recent progress in deep learning based 3D segmentation covering over 150 papers. It summarizes the most commonly used pipelines, discusses their highlights and shortcomings, and analyzes the competitive results of these segmentation methods. Based on the analysis, it also provides promising research directions for the future.Comment: Under review of ACM Computing Surveys, 36 pages, 10 tables, 9 figure

arXiv.org e-Print Archive

Two-stage visual navigation by deep neural networks and multi-goal reinforcement learning

Author: Bidoia Francesco
Chong Yiebo
Kuiper Cornel
Schomaker Lambert
Shantia Amirhossein
Timmers Rik
Wiering Marco
Publication venue: 'Elsevier BV'
Publication date: 01/04/2021
Field of study

In this paper, we propose a two-stage learning framework for visual navigation in which the experience of the agent during exploration of one goal is shared to learn to navigate to other goals. We train a deep neural network for estimating the robot's position in the environment using ground truth information provided by a classical localization and mapping approach. The second simpler multi-goal Q-function learns to traverse the environment by using the provided discretized map. Transfer learning is applied to the multi-goal Q-function from a maze structure to a 2D simulator and is finally deployed in a 3D simulator where the robot uses the estimated locations from the position estimator deep network. In the experiments, we first compare different architectures to select the best deep network for location estimation, and then compare the effects of the multi-goal reinforcement learning method to traditional reinforcement learning. The results show a significant improvement when multi-goal reinforcement learning is used. Furthermore, the results of the location estimator show that a deep network can learn and generalize in different environments using camera images with high accuracy in both position and orientation

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Machine Learning in Robotic Navigation:Deep Visual Localization and Adaptive Control

Author: Shantia Amir
Publication venue: 'University of Groningen Press'
Publication date: 01/01/2021
Field of study

The work conducted in this thesis contributes to the robotic navigation field by focusing on different machine learning solutions: supervised learning with (deep) neural networks, unsupervised learning, and reinforcement learning.First, we propose a semi-supervised machine learning approach that can dynamically update the robot controller's parameters using situational analysis through feature extraction and unsupervised clustering. The results show that the robot can adapt to the changes in its surroundings, resulting in a thirty percent improvement in navigation speed and stability.Then, we train multiple deep neural networks for estimating the robot's position in the environment using ground truth information provided by a classical localization and mapping approach. We prepare two image-based localization datasets in 3D simulation and compare the results of a traditional multilayer perceptron, a stacked denoising autoencoder, and a convolutional neural network (CNN). The experiment results show that our proposed inception based CNNs without pooling layers perform very well in all the environments. Finally, we propose a two-stage learning framework for visual navigation in which the experience of the agent during exploration of one goal is shared to learn to navigate to other goals. The multi-goal Q-function learns to traverse the environment by using the provided discretized map. Transfer learning is applied to the multi-goal Q-function from a maze structure to a 2D simulator and is finally deployed in a 3D simulator where the robot uses the estimated locations from the position estimator deep CNNs. The results show a significant improvement when multi-goal reinforcement learning is used

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs

Author: Abate Marcus
Carlone Luca
Chang Yun
Gupta Arjun
Hughes Nathan
Rosinol Antoni
Shi Jingnan
Violette Andrew
Publication venue
Publication date: 24/01/2021
Field of study

Humans are able to form a complex mental model of the environment they move in. This mental model captures geometric and semantic aspects of the scene, describes the environment at multiple levels of abstractions (e.g., objects, rooms, buildings), includes static and dynamic entities and their relations (e.g., a person is in a room at a given time). In contrast, current robots' internal representations still provide a partial and fragmented understanding of the environment, either in the form of a sparse or dense set of geometric primitives (e.g., points, lines, planes, voxels) or as a collection of objects. This paper attempts to reduce the gap between robot and human perception by introducing a novel representation, a 3D Dynamic Scene Graph(DSG), that seamlessly captures metric and semantic aspects of a dynamic environment. A DSG is a layered graph where nodes represent spatial concepts at different levels of abstraction, and edges represent spatio-temporal relations among nodes. Our second contribution is Kimera, the first fully automatic method to build a DSG from visual-inertial data. Kimera includes state-of-the-art techniques for visual-inertial SLAM, metric-semantic 3D reconstruction, object localization, human pose and shape estimation, and scene parsing. Our third contribution is a comprehensive evaluation of Kimera in real-life datasets and photo-realistic simulations, including a newly released dataset, uHumans2, which simulates a collection of crowded indoor and outdoor scenes. Our evaluation shows that Kimera achieves state-of-the-art performance in visual-inertial SLAM, estimates an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a complex indoor environment with tens of objects and humans in minutes. Our final contribution shows how to use a DSG for real-time hierarchical semantic path-planning. The core modules in Kimera are open-source.Comment: 34 pages, 25 figures, 9 tables. arXiv admin note: text overlap with arXiv:2002.0628

arXiv.org e-Print Archive

DSpace@MIT

Learning to plan with uncertain topological maps

Author: Beeching Edward
Dibangoye Jilles
Simonin Olivier
Wolf Christian
Publication venue
Publication date: 10/07/2020
Field of study

We train an agent to navigate in 3D environments using a hierarchical strategy including a high-level graph based planner and a local policy. Our main contribution is a data driven learning based approach for planning under uncertainty in topological maps, requiring an estimate of shortest paths in valued graphs with a probabilistic structure. Whereas classical symbolic algorithms achieve optimal results on noise-less topologies, or optimal results in a probabilistic sense on graphs with probabilistic structure, we aim to show that machine learning can overcome missing information in the graph by taking into account rich high-dimensional node features, for instance visual information available at each location of the map. Compared to purely learned neural white box algorithms, we structure our neural model with an inductive bias for dynamic programming based shortest path algorithms, and we show that a particular parameterization of our neural model corresponds to the Bellman-Ford algorithm. By performing an empirical analysis of our method in simulated photo-realistic 3D environments, we demonstrate that the inclusion of visual features in the learned neural planner outperforms classical symbolic solutions for graph based planning.Comment: ECCV 202

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

Selective Spatio-Temporal Aggregation Based Pose Refinement System: Towards Understanding Human Activities in Real-World Videos

Author: Bremond Francois
Dai Rui
Francesca Gianpiero
Mallick Rupayan
Minciullo Luca
Wang Yaohui
Yang Di
Publication venue
Publication date: 10/11/2020
Field of study

Taking advantage of human pose data for understanding human activities has attracted much attention these days. However, state-of-the-art pose estimators struggle in obtaining high-quality 2D or 3D pose data due to occlusion, truncation and low-resolution in real-world un-annotated videos. Hence, in this work, we propose 1) a Selective Spatio-Temporal Aggregation mechanism, named SST-A, that refines and smooths the keypoint locations extracted by multiple expert pose estimators, 2) an effective weakly-supervised self-training framework which leverages the aggregated poses as pseudo ground-truth instead of handcrafted annotations for real-world pose estimation. Extensive experiments are conducted for evaluating not only the upstream pose refinement but also the downstream action recognition performance on four datasets, Toyota Smarthome, NTU-RGB+D, Charades, and Kinetics-50. We demonstrate that the skeleton data refined by our Pose-Refinement system (SSTA-PRS) is effective at boosting various existing action recognition models, which achieves competitive or state-of-the-art performance.Comment: WACV202

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server