34,623 research outputs found
Bridging the Gap Between Events and Frames Through Unsupervised Domain Adaptation
Reliable perception during fast motion maneuvers or in high dynamic range environments is crucial for robotic systems. Since event cameras are robust to these challenging conditions, they have great potential to increase the reliability of robot vision. However, event-based vision has been held back by the shortage of labeled datasets due to the novelty of event cameras. To overcome this drawback, we propose a task transfer method to train models directly with labeled images and unlabeled event data. Compared to previous approaches, (i) our method transfers from single images to events instead of high frame rate videos, and (ii) does not rely on paired sensor data. To achieve this, we leverage the generative event model to split event features into content and motion features. This split enables efficient matching between latent spaces for events and images, which is crucial for successful task transfer. Thus, our approach unlocks the vast amount of existing image datasets for the training of event-based neural networks. Our task transfer method consistently outperforms methods targeting Unsupervised Domain Adaptation for object detection by 0.26 mAP (increase by 93%) and classification by 2.7% accuracy
Real-Time 6DOF Pose Relocalization for Event Cameras with Stacked Spatial LSTM Networks
We present a new method to relocalize the 6DOF pose of an event camera solely
based on the event stream. Our method first creates the event image from a list
of events that occurs in a very short time interval, then a Stacked Spatial
LSTM Network (SP-LSTM) is used to learn the camera pose. Our SP-LSTM is
composed of a CNN to learn deep features from the event images and a stack of
LSTM to learn spatial dependencies in the image feature space. We show that the
spatial dependency plays an important role in the relocalization task and the
SP-LSTM can effectively learn this information. The experimental results on a
publicly available dataset show that our approach generalizes well and
outperforms recent methods by a substantial margin. Overall, our proposed
method reduces by approx. 6 times the position error and 3 times the
orientation error compared to the current state of the art. The source code and
trained models will be released.Comment: 7 pages, 5 figure
Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources
Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The objective of this work is to estimate the current state of the surgical task based on the actions performed or events occurred as the task progresses. We propose Fusion-KVE, a unified surgical state estimation model that incorporates multiple data sources including the Kinematics, Vision, and system Events. Additionally, we examine the strengths and weaknesses of different state estimation models in segmenting states with different representative features or levels of granularity. We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), as well as a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging, created using the da Vinci® Xi surgical system. Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models in both JIGSAWS suturing dataset and our RIOUS dataset
Event-based control system on FPGA applied to the pencil balancer robotic platform
An event-based motor controller design is presented.
The system is designed to solve the classic inverted
pendulum problem by using a robotic platform and a totally
neuro-inspired event-based mechanism. Specifically, DVS retinas
provide feedback and an FPGA implements control. The robotic
platform used is the so called ’pencil balancer’. The retinas
provide visual information to the FPGA that processes it and
obtains the center of mass of the pencil. Once this center of
mass is averaged over time, it is used joint with the cart position
provided by a flat potentiometer bar to compute the angle of
the pencil from the vertical. The angle is delivered to an eventbased
Proportional-Derivative (PD) controller that drives the DC
motor using Pulse Frequency Modulation (PFM) to accomplish
the control objective. The results show an accurate, real-time and
efficient controller design
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Simultaneous Localization and Mapping (SLAM)consists in the concurrent
construction of a model of the environment (the map), and the estimation of the
state of the robot moving within it. The SLAM community has made astonishing
progress over the last 30 years, enabling large-scale real-world applications,
and witnessing a steady transition of this technology to industry. We survey
the current state of SLAM. We start by presenting what is now the de-facto
standard formulation for SLAM. We then review related work, covering a broad
set of topics including robustness and scalability in long-term mapping, metric
and semantic representations for mapping, theoretical performance guarantees,
active SLAM and exploration, and other new frontiers. This paper simultaneously
serves as a position paper and tutorial to those who are users of SLAM. By
looking at the published research with a critical eye, we delineate open
challenges and new research issues, that still deserve careful scientific
investigation. The paper also contains the authors' take on two questions that
often animate discussions during robotics conferences: Do robots need SLAM? and
Is SLAM solved
Translating Videos to Commands for Robotic Manipulation with Deep Recurrent Neural Networks
We present a new method to translate videos to commands for robotic
manipulation using Deep Recurrent Neural Networks (RNN). Our framework first
extracts deep features from the input video frames with a deep Convolutional
Neural Networks (CNN). Two RNN layers with an encoder-decoder architecture are
then used to encode the visual features and sequentially generate the output
words as the command. We demonstrate that the translation accuracy can be
improved by allowing a smooth transaction between two RNN layers and using the
state-of-the-art feature extractor. The experimental results on our new
challenging dataset show that our approach outperforms recent methods by a fair
margin. Furthermore, we combine the proposed translation module with the vision
and planning system to let a robot perform various manipulation tasks. Finally,
we demonstrate the effectiveness of our framework on a full-size humanoid robot
WALK-MAN
- …