We propose a general strategy for autonomous guidance and insertion of a
needle into a retinal blood vessel. The main challenges underpinning this task
are the accurate placement of the needle-tip on the target vein and a careful
needle insertion maneuver to avoid double-puncturing the vein, while dealing
with challenging kinematic constraints and depth-estimation uncertainty.
Following how surgeons perform this task purely based on visual feedback, we
develop a system which relies solely on \emph{monocular} visual cues by
combining data-driven kinematic and contact estimation, visual-servoing, and
model-based optimal control. By relying on both known kinematic models, as well
as deep-learning based perception modules, the system can localize the surgical
needle tip and detect needle-tissue interactions and venipuncture events. The
outputs from these perception modules are then combined with a motion planning
framework that uses visual-servoing and optimal control to cannulate the target
vein, while respecting kinematic constraints that consider the safety of the
procedure. We demonstrate that we can reliably and consistently perform needle
insertion in the domain of retinal surgery, specifically in performing retinal
vein cannulation. Using cadaveric pig eyes, we demonstrate that our system can
navigate to target veins within 22μm XY accuracy and perform the entire
procedure in less than 35 seconds on average, and all 24 trials performed on 4
pig eyes were successful. Preliminary comparison study against a human operator
show that our system is consistently more accurate and safer, especially during
safety-critical needle-tissue interactions. To the best of the authors'
knowledge, this work accomplishes a first demonstration of autonomous retinal
vein cannulation at a clinically-relevant setting using animal tissues