55 research outputs found
Vision and Learning for Deliberative Monocular Cluttered Flight
Cameras provide a rich source of information while being passive, cheap and
lightweight for small and medium Unmanned Aerial Vehicles (UAVs). In this work
we present the first implementation of receding horizon control, which is
widely used in ground vehicles, with monocular vision as the only sensing mode
for autonomous UAV flight in dense clutter. We make it feasible on UAVs via a
number of contributions: novel coupling of perception and control via relevant
and diverse, multiple interpretations of the scene around the robot, leveraging
recent advances in machine learning to showcase anytime budgeted cost-sensitive
feature selection, and fast non-linear regression for monocular depth
prediction. We empirically demonstrate the efficacy of our novel pipeline via
real world experiments of more than 2 kms through dense trees with a quadrotor
built from off-the-shelf parts. Moreover our pipeline is designed to combine
information from other modalities like stereo and lidar as well if available
Going Deeper into First-Person Activity Recognition
We bring together ideas from recent work on feature design for egocentric
action recognition under one framework by exploring the use of deep
convolutional neural networks (CNN). Recent work has shown that features such
as hand appearance, object attributes, local hand motion and camera ego-motion
are important for characterizing first-person actions. To integrate these ideas
under one framework, we propose a twin stream network architecture, where one
stream analyzes appearance information and the other stream analyzes motion
information. Our appearance stream encodes prior knowledge of the egocentric
paradigm by explicitly training the network to segment hands and localize
objects. By visualizing certain neuron activation of our network, we show that
our proposed architecture naturally learns features that capture object
attributes and hand-object configurations. Our extensive experiments on
benchmark egocentric action datasets show that our deep architecture enables
recognition rates that significantly outperform state-of-the-art techniques --
an average increase in accuracy over all datasets. Furthermore, by
learning to recognize objects, actions and activities jointly, the performance
of individual recognition tasks also increase by (actions) and
(objects). We also include the results of extensive ablative analysis to
highlight the importance of network design decisions.
Deep Selection: A Fully Supervised Camera Selection Network for Surgery Recordings
Recording surgery in operating rooms is an essential task for education and
evaluation of medical treatment. However, recording the desired targets, such
as the surgery field, surgical tools, or doctor's hands, is difficult because
the targets are heavily occluded during surgery. We use a recording system in
which multiple cameras are embedded in the surgical lamp, and we assume that at
least one camera is recording the target without occlusion at any given time.
As the embedded cameras obtain multiple video sequences, we address the task of
selecting the camera with the best view of the surgery. Unlike the conventional
method, which selects the camera based on the area size of the surgery field,
we propose a deep neural network that predicts the camera selection probability
from multiple video sequences by learning the supervision of the expert
annotation. We created a dataset in which six different types of plastic
surgery are recorded, and we provided the annotation of camera switching. Our
experiments show that our approach successfully switched between cameras and
outperformed three baseline methods.Comment: MICCAI 202
- …