54 research outputs found
Caveats on the first-generation da Vinci Research Kit: latent technical constraints and essential calibrations
Telesurgical robotic systems provide a well established form of assistance in
the operating theater, with evidence of growing uptake in recent years. Until
now, the da Vinci surgical system (Intuitive Surgical Inc, Sunnyvale,
California) has been the most widely adopted robot of this kind, with more than
6,700 systems in current clinical use worldwide [1]. To accelerate research on
robotic-assisted surgery, the retired first-generation da Vinci robots have
been redeployed for research use as "da Vinci Research Kits" (dVRKs), which
have been distributed to research institutions around the world to support both
training and research in the sector. In the past ten years, a great amount of
research on the dVRK has been carried out across a vast range of research
topics. During this extensive and distributed process, common technical issues
have been identified that are buried deep within the dVRK research and
development architecture, and were found to be common among dVRK user feedback,
regardless of the breadth and disparity of research directions identified. This
paper gathers and analyzes the most significant of these, with a focus on the
technical constraints of the first-generation dVRK, which both existing and
prospective users should be aware of before embarking onto dVRK-related
research. The hope is that this review will aid users in identifying and
addressing common limitations of the systems promptly, thus helping to
accelerate progress in the field.Comment: 15 pages, 7 figure
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
In this work we address the task of semantic image segmentation with Deep
Learning and make three main contributions that are experimentally shown to
have substantial practical merit. First, we highlight convolution with
upsampled filters, or 'atrous convolution', as a powerful tool in dense
prediction tasks. Atrous convolution allows us to explicitly control the
resolution at which feature responses are computed within Deep Convolutional
Neural Networks. It also allows us to effectively enlarge the field of view of
filters to incorporate larger context without increasing the number of
parameters or the amount of computation. Second, we propose atrous spatial
pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP
probes an incoming convolutional feature layer with filters at multiple
sampling rates and effective fields-of-views, thus capturing objects as well as
image context at multiple scales. Third, we improve the localization of object
boundaries by combining methods from DCNNs and probabilistic graphical models.
The commonly deployed combination of max-pooling and downsampling in DCNNs
achieves invariance but has a toll on localization accuracy. We overcome this
by combining the responses at the final DCNN layer with a fully connected
Conditional Random Field (CRF), which is shown both qualitatively and
quantitatively to improve localization performance. Our proposed "DeepLab"
system sets the new state-of-art at the PASCAL VOC-2012 semantic image
segmentation task, reaching 79.7% mIOU in the test set, and advances the
results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and
Cityscapes. All of our code is made publicly available online.Comment: Accepted by TPAM
Biologically inspired composite image sensor for deep field target tracking
The use of nonuniform image sensors in mobile based computer vision applications can be an effective solution when computational burden is problematic. Nonuniform image sensors are still in their infancy and as such have not been fully investigated for their unique qualities nor have they been extensively applied in practice. In this dissertation a system has been developed that can perform vision tasks in both the far field and the near field. In order to accomplish this, a new and novel image sensor system has been developed. Inspired by the biological aspects of the visual systems found in both falcons and primates, a composite multi-camera sensor was constructed. The sensor provides for expandable visual range, excellent depth of field, and produces a single compact output image based on the log-polar retinal-cortical mapping that occurs in primates. This mapping provides for scale and rotational tolerant processing which, in turn, supports the mitigation of perspective distortion found in strict Cartesian based sensor systems. Furthermore, the scale-tolerant representation of objects moving on trajectories parallel to the sensor\u27s optical axis allows for fast acquisition and tracking of objects moving at high rates of speed. In order to investigate how effective this combination would be for object detection and tracking at both near and far field, the system was tuned for the application of vehicle detection and tracking from a moving platform. Finally, it was shown that the capturing of license plate information in an autonomous fashion could easily be accomplished from the extraction of information contained in the mapped log-polar representation space.
The novel composite log-polar deep-field image sensor opens new horizons for computer vision. This current work demonstrates features that can benefit applications beyond the high-speed vehicle tracking for drivers assistance and license plate capture. Some of the future applications envisioned include obstacle detection for high-speed trains, computer assisted aircraft landing, and computer assisted spacecraft docking
HDR Video Reconstruction with a Large Dynamic Dataset in Raw and sRGB Domains
High dynamic range (HDR) video reconstruction is attracting more and more
attention due to the superior visual quality compared with those of low dynamic
range (LDR) videos. The availability of LDR-HDR training pairs is essential for
the HDR reconstruction quality. However, there are still no real LDR-HDR pairs
for dynamic scenes due to the difficulty in capturing LDR-HDR frames
simultaneously. In this work, we propose to utilize a staggered sensor to
capture two alternate exposure images simultaneously, which are then fused into
an HDR frame in both raw and sRGB domains. In this way, we build a large scale
LDR-HDR video dataset with 85 scenes and each scene contains 60 frames. Based
on this dataset, we further propose a Raw-HDRNet, which utilizes the raw LDR
frames as inputs. We propose a pyramid flow-guided deformation convolution to
align neighboring frames. Experimental results demonstrate that 1) the proposed
dataset can improve the HDR reconstruction performance on real scenes for three
benchmark networks; 2) Compared with sRGB inputs, utilizing raw inputs can
further improve the reconstruction quality and our proposed Raw-HDRNet is a
strong baseline for raw HDR reconstruction. Our dataset and code will be
released after the acceptance of this paper
Development of Fast Motion Estimation Algorithms for Video Comression
With the increasing popularity of technologies such as Internet streaming video and video conferencing, video compression has became an essential component of broadcast and entertainment media. Motion Estimation (ME) and compensation techniques, which can eliminate temporal redundancy between adjacent frames effectively, have been widely applied to popular video compression coding standards such as MPEG-2, MPEG-4. Traditional fast block matching algorithms are easily trapped into the local minima resulting in degradation on video quality to some extent after decoding. Since Evolutionary Computing Techniques are suitable for achieving global optimal solution, these techniques are introduced to do Motion Estimation procedure in this thesis. Zero Motion prejudgement is also included which aims at finding static macroblocks (MB) which do not need to perform remaining search thus reduces the computational cost. Simulation results obtained show that the proposed Clonal Particle Swarm Optimization algorithm given a very good improvement in reducing the computations overhead and achieves very good Peak Signal to Noise Ratio (PSNR) values, which makes the techniques more efficient than the conventional searching algorithms. To reduce the Motion vector overhead in Bidirectional frame prediction, in this thesis novel Bidirectional Motion Estimation algorithm based on PSO is also proposed and results shows that the proposed method can significantly reduces the computational complexity involved in the Bidirectional frame prediction and also least prediction error in all video sequence
Two-path network with feedback connections for pan-sharpening in remote sensing
High-resolution multi-spectral images are desired for applications in remote sensing. However, multi-spectral images can only be provided in low resolutions by optical remote sensing satellites. The technique of pan-sharpening wants to generate high-resolution multi-spectral (MS) images based on a panchromatic (PAN) image and the low-resolution counterpart. The conventional deep learning based pan-sharpening methods process the panchromatic and the low-resolution image in a feedforward manner where shallow layers fail to access useful information from deep layers. To make full use of the powerful deep features that have strong representation ability, we propose a two-path network with feedback connections, through which the deep features can be rerouted for refining the shallow features in a feedback manner. Specifically, we leverage the structure of a recurrent neural network to pass the feedback information. Besides, a power feature extraction block with multiple projection pairs is designed to handle the feedback information and to produce power deep features. Extensive experimental results show the effectiveness of our proposed method
- …