Search CORE

55 research outputs found

Vision and Learning for Deliberative Monocular Cluttered Flight

Author: Agcayazi M. Talha
Bagnell J. Andrew
Daftry Shreyansh
Dey Debadeepta
Eriksen Christopher
Hebert Martial
Mehta Rupesh
Shankar Kumar Shaurya
Zeng Sam
Publication venue
Publication date: 23/11/2014
Field of study

Cameras provide a rich source of information while being passive, cheap and lightweight for small and medium Unmanned Aerial Vehicles (UAVs). In this work we present the first implementation of receding horizon control, which is widely used in ground vehicles, with monocular vision as the only sensing mode for autonomous UAV flight in dense clutter. We make it feasible on UAVs via a number of contributions: novel coupling of perception and control via relevant and diverse, multiple interpretations of the scene around the robot, leveraging recent advances in machine learning to showcase anytime budgeted cost-sensitive feature selection, and fast non-linear regression for monocular depth prediction. We empirically demonstrate the efficacy of our novel pipeline via real world experiments of more than 2 kms through dense trees with a quadrotor built from off-the-shelf parts. Moreover our pipeline is designed to combine information from other modalities like stereo and lidar as well if available

arXiv.org e-Print Archive

CiteSeerX

Going Deeper into First-Person Activity Recognition

Author: Fan Haoqi
Kitani Kris M.
Ma Minghuang
Publication venue
Publication date: 12/05/2016
Field of study

We bring together ideas from recent work on feature design for egocentric action recognition under one framework by exploring the use of deep convolutional neural networks (CNN). Recent work has shown that features such as hand appearance, object attributes, local hand motion and camera ego-motion are important for characterizing first-person actions. To integrate these ideas under one framework, we propose a twin stream network architecture, where one stream analyzes appearance information and the other stream analyzes motion information. Our appearance stream encodes prior knowledge of the egocentric paradigm by explicitly training the network to segment hands and localize objects. By visualizing certain neuron activation of our network, we show that our proposed architecture naturally learns features that capture object attributes and hand-object configurations. Our extensive experiments on benchmark egocentric action datasets show that our deep architecture enables recognition rates that significantly outperform state-of-the-art techniques -- an average

6.6\%

increase in accuracy over all datasets. Furthermore, by learning to recognize objects, actions and activities jointly, the performance of individual recognition tasks also increase by

30\%

(actions) and

14\%

(objects). We also include the results of extensive ablative analysis to highlight the importance of network design decisions.

arXiv.org e-Print Archive

Crossref

Deep Selection: A Fully Supervised Camera Selection Network for Surgery Recordings

Author: Hachiuma Ryo
Kajita Hiroki
Saito Hideo
Shimizu Tomohiro
Takatsume Yoshifumi
Publication venue
Publication date: 28/03/2023
Field of study

Recording surgery in operating rooms is an essential task for education and evaluation of medical treatment. However, recording the desired targets, such as the surgery field, surgical tools, or doctor's hands, is difficult because the targets are heavily occluded during surgery. We use a recording system in which multiple cameras are embedded in the surgical lamp, and we assume that at least one camera is recording the target without occlusion at any given time. As the embedded cameras obtain multiple video sequences, we address the task of selecting the camera with the best view of the surgery. Unlike the conventional method, which selects the camera based on the area size of the surgery field, we propose a deep neural network that predicts the camera selection probability from multiple video sequences by learning the supervision of the expert annotation. We created a dataset in which six different types of plastic surgery are recorded, and we provided the annotation of camera switching. Our experiments show that our approach successfully switched between cameras and outperformed three baseline methods.Comment: MICCAI 202

arXiv.org e-Print Archive