Search CORE

5,148 research outputs found

Unsupervised Discovery of Parts, Structure, and Dynamics

Author: Freeman William T.
Liu Zhijian
Murphy Kevin
Sun Chen
Tenenbaum Joshua B.
Wu Jiajun
Xu Zhenjia
Publication venue
Publication date: 12/03/2019
Field of study

Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future. In this paper, we propose a novel formulation that simultaneously learns a hierarchical, disentangled object representation and a dynamics model for object parts from unlabeled videos. Our Parts, Structure, and Dynamics (PSD) model learns to, first, recognize the object parts via a layered image representation; second, predict hierarchy via a structural descriptor that composes low-level concepts into a hierarchical structure; and third, model the system dynamics by predicting the future. Experiments on multiple real and synthetic datasets demonstrate that our PSD model works well on all three tasks: segmenting object parts, building their hierarchical structure, and capturing their motion distributions.Comment: ICLR 2019. The first two authors contributed equally to this wor

arXiv.org e-Print Archive

DSpace@MIT

A Unified Framework for Mutual Improvement of SLAM and Semantic Segmentation

Author: Han Liming
Hua Minjie
Huang Bill
Lian Shiguo
Lin Yimin
Wang Kai
Wang Luowei
Wang Xiang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/03/2019
Field of study

This paper presents a novel framework for simultaneously implementing localization and segmentation, which are two of the most important vision-based tasks for robotics. While the goals and techniques used for them were considered to be different previously, we show that by making use of the intermediate results of the two modules, their performance can be enhanced at the same time. Our framework is able to handle both the instantaneous motion and long-term changes of instances in localization with the help of the segmentation result, which also benefits from the refined 3D pose information. We conduct experiments on various datasets, and prove that our framework works effectively on improving the precision and robustness of the two tasks and outperforms existing localization and segmentation algorithms.Comment: 7 pages, 5 figures.This work has been accepted by ICRA 2019. The demo video can be found at https://youtu.be/Bkt53dAehj

arXiv.org e-Print Archive

Crossref

Co-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple Objects

Author: Agapito Lourdes
Rünz Martin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/06/2017
Field of study

In this paper we introduce Co-Fusion, a dense SLAM system that takes a live stream of RGB-D images as input and segments the scene into different objects (using either motion or semantic cues) while simultaneously tracking and reconstructing their 3D shape in real time. We use a multiple model fitting approach where each object can move independently from the background and still be effectively tracked and its shape fused over time using only the information from pixels associated with that object label. Previous attempts to deal with dynamic scenes have typically considered moving regions as outliers, and consequently do not model their shape or track their motion over time. In contrast, we enable the robot to maintain 3D models for each of the segmented objects and to improve them over time through fusion. As a result, our system can enable a robot to maintain a scene description at the object level which has the potential to allow interactions with its working environment; even in the case of dynamic scenes.Comment: International Conference on Robotics and Automation (ICRA) 2017, http://visual.cs.ucl.ac.uk/pubs/cofusion, https://github.com/martinruenz/co-fusio

arXiv.org e-Print Archive

Crossref

UCL Discovery

The Whole World in Your Hand: Active and Interactive Segmentation

Author: Arsenio Artur
Fitzpatrick Paul
Kemp Charles C.
Metta Giorgio
Publication venue: Lund University Cognitive Studies
Publication date: 01/01/2003
Field of study

Object segmentation is a fundamental problem in computer vision and a powerful resource for development. This paper presents three embodied approaches to the visual segmentation of objects. Each approach to segmentation is aided by the presence of a hand or arm in the proximity of the object to be segmented. The first approach is suitable for a robotic system, where the robot can use its arm to evoke object motion. The second method operates on a wearable system, viewing the world from a human's perspective, with instrumentation to help detect and segment objects that are held in the wearer's hand. The third method operates when observing a human teacher, locating periodic motion (finger/arm/object waving or tapping) and using it as a seed for segmentation. We show that object segmentation can serve as a key resource for development by demonstrating methods that exploit high-quality object segmentations to develop both low-level vision capabilities (specialized feature detectors) and high-level vision capabilities (object recognition and localization)

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

Recommended from our members

Motion Segmentation - Segmentation of Independently Moving Objects in Video

Author: Bideau Pia Katalin
Publication venue: ScholarWorks@UMass Amherst
Publication date: 24/03/2020
Field of study

The ability to recognize motion is one of the most important functions of our visual system. Motion allows us both to recognize objects and to get a better understanding of the 3D world in which we are moving. Because of its importance, motion is used to answer a wide variety of fundamental questions in computer vision such as: (1) Which objects are moving independently in the world? (2) Which objects are close and which objects are far away? (3) How is the camera moving?My work addresses the problem of moving object segmentation in unconstrained videos. I developed a probabilistic approach to segment independently moving objects in a video sequence, connecting aspects of camera motion estimation, relative depth and flow statistics. My work consists of three major parts: Modeling motion using a simple (rigid) motion model strictly following the principles of perspective projection and segmenting the video into its different motion components by assigning each pixel to its most likely motion model in a Bayesian fashion. Combining piecewise rigid motions to more complex, deformable and articulated objects, guided by learned semantic object segmentations. Learning highly variable motion patterns using a neural network trained on synthetic (unlimited) training data. Training data is automatically generated strictly following the principles of perspective projection. In this way well-known geometric constraints are precisely characterized during training to learn the principles of motion segmentation rather than identifying well-known structures that are likely to move. This work shows that a careful analysis of the motion field not only leads to a consistent segmentation of moving objects in a video sequence, but also helps us understand the scene geometry of the world we are moving in

ScholarWorks@UMass Amherst

Learning Video Object Segmentation with Visual Memory

Author: Alahari Karteek
Schmid Cordelia
Tokmakov Pavel
Publication venue
Publication date: 12/07/2017
Field of study

This paper addresses the task of segmenting moving objects in unconstrained videos. We introduce a novel two-stream neural network with an explicit memory module to achieve this. The two streams of the network encode spatial and temporal features in a video sequence respectively, while the memory module captures the evolution of objects over time. The module to build a "visual memory" in video, i.e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences. Given a video frame as input, our approach assigns each pixel an object or background label based on the learned spatio-temporal features as well as the "visual memory" specific to the video, acquired automatically without any manually-annotated frames. The visual memory is implemented with convolutional gated recurrent units, which allows to propagate spatial information over time. We evaluate our method extensively on two benchmarks, DAVIS and Freiburg-Berkeley motion segmentation datasets, and show state-of-the-art results. For example, our approach outperforms the top method on the DAVIS dataset by nearly 6%. We also provide an extensive ablative analysis to investigate the influence of each component in the proposed framework

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server