272,963 research outputs found
RGB-D datasets using microsoft kinect or similar sensors: a survey
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
A decision forest based feature selection framework for action recognition from RGB-Depth cameras
In this paper, we present an action recognition framework
leveraging data mining capabilities of random decision forests trained on
kinematic features. We describe human motion via a rich collection of
kinematic feature time-series computed from the skeletal representation
of the body in motion. We discriminatively optimize a random decision
forest model over this collection to identify the most effective subset
of features, localized both in time and space. Later, we train a support
vector machine classifier on the selected features. This approach improves
upon the baseline performance obtained using the whole feature set with
a significantly less number of features (one tenth of the original). On
MSRC-12 dataset (12 classes), our method achieves 94% accuracy. On
the WorkoutSU-10 dataset, collected by our group (10 physical exercise
classes), the accuracy is 98%. The approach can also be used to provide
insights on the spatiotemporal dynamics of human actions
Sampling-Based Methods for Factored Task and Motion Planning
This paper presents a general-purpose formulation of a large class of
discrete-time planning problems, with hybrid state and control-spaces, as
factored transition systems. Factoring allows state transitions to be described
as the intersection of several constraints each affecting a subset of the state
and control variables. Robotic manipulation problems with many movable objects
involve constraints that only affect several variables at a time and therefore
exhibit large amounts of factoring. We develop a theoretical framework for
solving factored transition systems with sampling-based algorithms. The
framework characterizes conditions on the submanifold in which solutions lie,
leading to a characterization of robust feasibility that incorporates
dimensionality-reducing constraints. It then connects those conditions to
corresponding conditional samplers that can be composed to produce values on
this submanifold. We present two domain-independent, probabilistically complete
planning algorithms that take, as input, a set of conditional samplers. We
demonstrate the empirical efficiency of these algorithms on a set of
challenging task and motion planning problems involving picking, placing, and
pushing
Robust recognition and segmentation of human actions using HMMs with missing observations
This paper describes the integration of missing observation data with hidden Markov models to create a framework that is able to segment and classify individual actions from a stream of human motion using an incomplete 3D human pose estimation. Based on this framework, a model is trained to automatically segment and classify an activity sequence into its constituent subactions during inferencing. This is achieved by introducing action labels into the observation vector and setting these labels as missing data during inferencing, thus forcing the system to infer the probability of each action label. Additionally, missing data provides recognition-level support for occlusions and imperfect silhouette segmentation, permitting the use of a fast (real-time) pose estimation that delegates the burden of handling undetected limbs onto the action recognition system. Findings show that the use of missing data to segment activities is an accurate and elegant approach. Furthermore, action recognition can be accurate even when almost half of the pose feature data is missing due to occlusions, since not all of the pose data is important all of the time
Object-Oriented Dynamics Learning through Multi-Level Abstraction
Object-based approaches for learning action-conditioned dynamics has
demonstrated promise for generalization and interpretability. However, existing
approaches suffer from structural limitations and optimization difficulties for
common environments with multiple dynamic objects. In this paper, we present a
novel self-supervised learning framework, called Multi-level Abstraction
Object-oriented Predictor (MAOP), which employs a three-level learning
architecture that enables efficient object-based dynamics learning from raw
visual observations. We also design a spatial-temporal relational reasoning
mechanism for MAOP to support instance-level dynamics learning and handle
partial observability. Our results show that MAOP significantly outperforms
previous methods in terms of sample efficiency and generalization over novel
environments for learning environment models. We also demonstrate that learned
dynamics models enable efficient planning in unseen environments, comparable to
true environment models. In addition, MAOP learns semantically and visually
interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial
Intelligence (AAAI), 202
The THUMOS Challenge on Action Recognition for Videos "in the Wild"
Automatically recognizing and localizing wide ranges of human actions has
crucial importance for video understanding. Towards this goal, the THUMOS
challenge was introduced in 2013 to serve as a benchmark for action
recognition. Until then, video action recognition, including THUMOS challenge,
had focused primarily on the classification of pre-segmented (i.e., trimmed)
videos, which is an artificial task. In THUMOS 2014, we elevated action
recognition to a more practical level by introducing temporally untrimmed
videos. These also include `background videos' which share similar scenes and
backgrounds as action videos, but are devoid of the specific actions. The three
editions of the challenge organized in 2013--2015 have made THUMOS a common
benchmark for action classification and detection and the annual challenge is
widely attended by teams from around the world.
In this paper we describe the THUMOS benchmark in detail and give an overview
of data collection and annotation procedures. We present the evaluation
protocols used to quantify results in the two THUMOS tasks of action
classification and temporal detection. We also present results of submissions
to the THUMOS 2015 challenge and review the participating approaches.
Additionally, we include a comprehensive empirical study evaluating the
differences in action recognition between trimmed and untrimmed videos, and how
well methods trained on trimmed videos generalize to untrimmed videos. We
conclude by proposing several directions and improvements for future THUMOS
challenges.Comment: Preprint submitted to Computer Vision and Image Understandin
PDDLStream: Integrating Symbolic Planners and Blackbox Samplers via Optimistic Adaptive Planning
Many planning applications involve complex relationships defined on
high-dimensional, continuous variables. For example, robotic manipulation
requires planning with kinematic, collision, visibility, and motion constraints
involving robot configurations, object poses, and robot trajectories. These
constraints typically require specialized procedures to sample satisfying
values. We extend PDDL to support a generic, declarative specification for
these procedures that treats their implementation as black boxes. We provide
domain-independent algorithms that reduce PDDLStream problems to a sequence of
finite PDDL problems. We also introduce an algorithm that dynamically balances
exploring new candidate plans and exploiting existing ones. This enables the
algorithm to greedily search the space of parameter bindings to more quickly
solve tightly-constrained problems as well as locally optimize to produce
low-cost solutions. We evaluate our algorithms on three simulated robotic
planning domains as well as several real-world robotic tasks.Comment: International Conference on Automated Planning and Scheduling (ICAPS)
202
- …