54 research outputs found

    Caveats on the first-generation da Vinci Research Kit: latent technical constraints and essential calibrations

    Full text link
    Telesurgical robotic systems provide a well established form of assistance in the operating theater, with evidence of growing uptake in recent years. Until now, the da Vinci surgical system (Intuitive Surgical Inc, Sunnyvale, California) has been the most widely adopted robot of this kind, with more than 6,700 systems in current clinical use worldwide [1]. To accelerate research on robotic-assisted surgery, the retired first-generation da Vinci robots have been redeployed for research use as "da Vinci Research Kits" (dVRKs), which have been distributed to research institutions around the world to support both training and research in the sector. In the past ten years, a great amount of research on the dVRK has been carried out across a vast range of research topics. During this extensive and distributed process, common technical issues have been identified that are buried deep within the dVRK research and development architecture, and were found to be common among dVRK user feedback, regardless of the breadth and disparity of research directions identified. This paper gathers and analyzes the most significant of these, with a focus on the technical constraints of the first-generation dVRK, which both existing and prospective users should be aware of before embarking onto dVRK-related research. The hope is that this review will aid users in identifying and addressing common limitations of the systems promptly, thus helping to accelerate progress in the field.Comment: 15 pages, 7 figure

    DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

    Get PDF
    In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.Comment: Accepted by TPAM

    Biologically inspired composite image sensor for deep field target tracking

    Get PDF
    The use of nonuniform image sensors in mobile based computer vision applications can be an effective solution when computational burden is problematic. Nonuniform image sensors are still in their infancy and as such have not been fully investigated for their unique qualities nor have they been extensively applied in practice. In this dissertation a system has been developed that can perform vision tasks in both the far field and the near field. In order to accomplish this, a new and novel image sensor system has been developed. Inspired by the biological aspects of the visual systems found in both falcons and primates, a composite multi-camera sensor was constructed. The sensor provides for expandable visual range, excellent depth of field, and produces a single compact output image based on the log-polar retinal-cortical mapping that occurs in primates. This mapping provides for scale and rotational tolerant processing which, in turn, supports the mitigation of perspective distortion found in strict Cartesian based sensor systems. Furthermore, the scale-tolerant representation of objects moving on trajectories parallel to the sensor\u27s optical axis allows for fast acquisition and tracking of objects moving at high rates of speed. In order to investigate how effective this combination would be for object detection and tracking at both near and far field, the system was tuned for the application of vehicle detection and tracking from a moving platform. Finally, it was shown that the capturing of license plate information in an autonomous fashion could easily be accomplished from the extraction of information contained in the mapped log-polar representation space. The novel composite log-polar deep-field image sensor opens new horizons for computer vision. This current work demonstrates features that can benefit applications beyond the high-speed vehicle tracking for drivers assistance and license plate capture. Some of the future applications envisioned include obstacle detection for high-speed trains, computer assisted aircraft landing, and computer assisted spacecraft docking

    On the design of multimedia architectures : proceedings of a one-day workshop, Eindhoven, December 18, 2003

    Get PDF

    HDR Video Reconstruction with a Large Dynamic Dataset in Raw and sRGB Domains

    Full text link
    High dynamic range (HDR) video reconstruction is attracting more and more attention due to the superior visual quality compared with those of low dynamic range (LDR) videos. The availability of LDR-HDR training pairs is essential for the HDR reconstruction quality. However, there are still no real LDR-HDR pairs for dynamic scenes due to the difficulty in capturing LDR-HDR frames simultaneously. In this work, we propose to utilize a staggered sensor to capture two alternate exposure images simultaneously, which are then fused into an HDR frame in both raw and sRGB domains. In this way, we build a large scale LDR-HDR video dataset with 85 scenes and each scene contains 60 frames. Based on this dataset, we further propose a Raw-HDRNet, which utilizes the raw LDR frames as inputs. We propose a pyramid flow-guided deformation convolution to align neighboring frames. Experimental results demonstrate that 1) the proposed dataset can improve the HDR reconstruction performance on real scenes for three benchmark networks; 2) Compared with sRGB inputs, utilizing raw inputs can further improve the reconstruction quality and our proposed Raw-HDRNet is a strong baseline for raw HDR reconstruction. Our dataset and code will be released after the acceptance of this paper

    Development of Fast Motion Estimation Algorithms for Video Comression

    Get PDF
    With the increasing popularity of technologies such as Internet streaming video and video conferencing, video compression has became an essential component of broadcast and entertainment media. Motion Estimation (ME) and compensation techniques, which can eliminate temporal redundancy between adjacent frames effectively, have been widely applied to popular video compression coding standards such as MPEG-2, MPEG-4. Traditional fast block matching algorithms are easily trapped into the local minima resulting in degradation on video quality to some extent after decoding. Since Evolutionary Computing Techniques are suitable for achieving global optimal solution, these techniques are introduced to do Motion Estimation procedure in this thesis. Zero Motion prejudgement is also included which aims at finding static macroblocks (MB) which do not need to perform remaining search thus reduces the computational cost. Simulation results obtained show that the proposed Clonal Particle Swarm Optimization algorithm given a very good improvement in reducing the computations overhead and achieves very good Peak Signal to Noise Ratio (PSNR) values, which makes the techniques more efficient than the conventional searching algorithms. To reduce the Motion vector overhead in Bidirectional frame prediction, in this thesis novel Bidirectional Motion Estimation algorithm based on PSO is also proposed and results shows that the proposed method can significantly reduces the computational complexity involved in the Bidirectional frame prediction and also least prediction error in all video sequence

    On the design of multimedia architectures : proceedings of a one-day workshop, Eindhoven, December 18, 2003

    Get PDF

    Two-path network with feedback connections for pan-sharpening in remote sensing

    Get PDF
    High-resolution multi-spectral images are desired for applications in remote sensing. However, multi-spectral images can only be provided in low resolutions by optical remote sensing satellites. The technique of pan-sharpening wants to generate high-resolution multi-spectral (MS) images based on a panchromatic (PAN) image and the low-resolution counterpart. The conventional deep learning based pan-sharpening methods process the panchromatic and the low-resolution image in a feedforward manner where shallow layers fail to access useful information from deep layers. To make full use of the powerful deep features that have strong representation ability, we propose a two-path network with feedback connections, through which the deep features can be rerouted for refining the shallow features in a feedback manner. Specifically, we leverage the structure of a recurrent neural network to pass the feedback information. Besides, a power feature extraction block with multiple projection pairs is designed to handle the feedback information and to produce power deep features. Extensive experimental results show the effectiveness of our proposed method
    corecore