2,522 research outputs found
Learning Action Maps of Large Environments via First-Person Vision
When people observe and interact with physical spaces, they are able to
associate functionality to regions in the environment. Our goal is to automate
dense functional understanding of large spaces by leveraging sparse activity
demonstrations recorded from an ego-centric viewpoint. The method we describe
enables functionality estimation in large scenes where people have behaved, as
well as novel scenes where no behaviors are observed. Our method learns and
predicts "Action Maps", which encode the ability for a user to perform
activities at various locations. With the usage of an egocentric camera to
observe human activities, our method scales with the size of the scene without
the need for mounting multiple static surveillance cameras and is well-suited
to the task of observing activities up-close. We demonstrate that by capturing
appearance-based attributes of the environment and associating these attributes
with activity demonstrations, our proposed mathematical framework allows for
the prediction of Action Maps in new environments. Additionally, we offer a
preliminary glance of the applicability of Action Maps by demonstrating a
proof-of-concept application in which they are used in concert with activity
detections to perform localization.Comment: To appear at CVPR 201
Pattern Matching and Discourse Processing in Information Extraction from Japanese Text
Information extraction is the task of automatically picking up information of
interest from an unconstrained text. Information of interest is usually
extracted in two steps. First, sentence level processing locates relevant
pieces of information scattered throughout the text; second, discourse
processing merges coreferential information to generate the output. In the
first step, pieces of information are locally identified without recognizing
any relationships among them. A key word search or simple pattern search can
achieve this purpose. The second step requires deeper knowledge in order to
understand relationships among separately identified pieces of information.
Previous information extraction systems focused on the first step, partly
because they were not required to link up each piece of information with other
pieces. To link the extracted pieces of information and map them onto a
structured output format, complex discourse processing is essential. This paper
reports on a Japanese information extraction system that merges information
using a pattern matcher and discourse processor. Evaluation results show a high
level of system performance which approaches human performance.Comment: See http://www.jair.org/ for any accompanying file
Going Deeper into First-Person Activity Recognition
We bring together ideas from recent work on feature design for egocentric
action recognition under one framework by exploring the use of deep
convolutional neural networks (CNN). Recent work has shown that features such
as hand appearance, object attributes, local hand motion and camera ego-motion
are important for characterizing first-person actions. To integrate these ideas
under one framework, we propose a twin stream network architecture, where one
stream analyzes appearance information and the other stream analyzes motion
information. Our appearance stream encodes prior knowledge of the egocentric
paradigm by explicitly training the network to segment hands and localize
objects. By visualizing certain neuron activation of our network, we show that
our proposed architecture naturally learns features that capture object
attributes and hand-object configurations. Our extensive experiments on
benchmark egocentric action datasets show that our deep architecture enables
recognition rates that significantly outperform state-of-the-art techniques --
an average increase in accuracy over all datasets. Furthermore, by
learning to recognize objects, actions and activities jointly, the performance
of individual recognition tasks also increase by (actions) and
(objects). We also include the results of extensive ablative analysis to
highlight the importance of network design decisions.
- …