51,221 research outputs found
DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation
In real-world crowd counting applications, the crowd densities vary greatly
in spatial and temporal domains. A detection based counting method will
estimate crowds accurately in low density scenes, while its reliability in
congested areas is downgraded. A regression based approach, on the other hand,
captures the general density information in crowded regions. Without knowing
the location of each person, it tends to overestimate the count in low density
areas. Thus, exclusively using either one of them is not sufficient to handle
all kinds of scenes with varying densities. To address this issue, a novel
end-to-end crowd counting framework, named DecideNet (DEteCtIon and Density
Estimation Network) is proposed. It can adaptively decide the appropriate
counting mode for different locations on the image based on its real density
conditions. DecideNet starts with estimating the crowd density by generating
detection and regression based density maps separately. To capture inevitable
variation in densities, it incorporates an attention module, meant to
adaptively assess the reliability of the two types of estimations. The final
crowd counts are obtained with the guidance of the attention module to adopt
suitable estimations from the two kinds of density maps. Experimental results
show that our method achieves state-of-the-art performance on three challenging
crowd counting datasets.Comment: CVPR 201
Learning Action Maps of Large Environments via First-Person Vision
When people observe and interact with physical spaces, they are able to
associate functionality to regions in the environment. Our goal is to automate
dense functional understanding of large spaces by leveraging sparse activity
demonstrations recorded from an ego-centric viewpoint. The method we describe
enables functionality estimation in large scenes where people have behaved, as
well as novel scenes where no behaviors are observed. Our method learns and
predicts "Action Maps", which encode the ability for a user to perform
activities at various locations. With the usage of an egocentric camera to
observe human activities, our method scales with the size of the scene without
the need for mounting multiple static surveillance cameras and is well-suited
to the task of observing activities up-close. We demonstrate that by capturing
appearance-based attributes of the environment and associating these attributes
with activity demonstrations, our proposed mathematical framework allows for
the prediction of Action Maps in new environments. Additionally, we offer a
preliminary glance of the applicability of Action Maps by demonstrating a
proof-of-concept application in which they are used in concert with activity
detections to perform localization.Comment: To appear at CVPR 201
LabelFusion: A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes
Deep neural network (DNN) architectures have been shown to outperform
traditional pipelines for object segmentation and pose estimation using RGBD
data, but the performance of these DNN pipelines is directly tied to how
representative the training data is of the true data. Hence a key requirement
for employing these methods in practice is to have a large set of labeled data
for your specific robotic manipulation task, a requirement that is not
generally satisfied by existing datasets. In this paper we develop a pipeline
to rapidly generate high quality RGBD data with pixelwise labels and object
poses. We use an RGBD camera to collect video of a scene from multiple
viewpoints and leverage existing reconstruction techniques to produce a 3D
dense reconstruction. We label the 3D reconstruction using a human assisted
ICP-fitting of object meshes. By reprojecting the results of labeling the 3D
scene we can produce labels for each RGBD image of the scene. This pipeline
enabled us to collect over 1,000,000 labeled object instances in just a few
days. We use this dataset to answer questions related to how much training data
is required, and of what quality the data must be, to achieve high performance
from a DNN architecture
- …