78 research outputs found

    Lifting GIS Maps into Strong Geometric Context for Scene Understanding

    Full text link
    Contextual information can have a substantial impact on the performance of visual tasks such as semantic segmentation, object detection, and geometric estimation. Data stored in Geographic Information Systems (GIS) offers a rich source of contextual information that has been largely untapped by computer vision. We propose to leverage such information for scene understanding by combining GIS resources with large sets of unorganized photographs using Structure from Motion (SfM) techniques. We present a pipeline to quickly generate strong 3D geometric priors from 2D GIS data using SfM models aligned with minimal user input. Given an image resectioned against this model, we generate robust predictions of depth, surface normals, and semantic labels. We show that the precision of the predicted geometry is substantially more accurate other single-image depth estimation methods. We then demonstrate the utility of these contextual constraints for re-scoring pedestrian detections, and use these GIS contextual features alongside object detection score maps to improve a CRF-based semantic segmentation framework, boosting accuracy over baseline models

    Parallelized computational 3D video microscopy of freely moving organisms at multiple gigapixels per second

    Full text link
    To study the behavior of freely moving model organisms such as zebrafish (Danio rerio) and fruit flies (Drosophila) across multiple spatial scales, it would be ideal to use a light microscope that can resolve 3D information over a wide field of view (FOV) at high speed and high spatial resolution. However, it is challenging to design an optical instrument to achieve all of these properties simultaneously. Existing techniques for large-FOV microscopic imaging and for 3D image measurement typically require many sequential image snapshots, thus compromising speed and throughput. Here, we present 3D-RAPID, a computational microscope based on a synchronized array of 54 cameras that can capture high-speed 3D topographic videos over a 135-cm^2 area, achieving up to 230 frames per second at throughputs exceeding 5 gigapixels (GPs) per second. 3D-RAPID features a 3D reconstruction algorithm that, for each synchronized temporal snapshot, simultaneously fuses all 54 images seamlessly into a globally-consistent composite that includes a coregistered 3D height map. The self-supervised 3D reconstruction algorithm itself trains a spatiotemporally-compressed convolutional neural network (CNN) that maps raw photometric images to 3D topography, using stereo overlap redundancy and ray-propagation physics as the only supervision mechanism. As a result, our end-to-end 3D reconstruction algorithm is robust to generalization errors and scales to arbitrarily long videos from arbitrarily sized camera arrays. The scalable hardware and software design of 3D-RAPID addresses a longstanding problem in the field of behavioral imaging, enabling parallelized 3D observation of large collections of freely moving organisms at high spatiotemporal throughputs, which we demonstrate in ants (Pogonomyrmex barbatus), fruit flies, and zebrafish larvae

    Ground Plane Matters: Picking Up Ground Plane Prior in Monocular 3D Object Detection

    Full text link
    The ground plane prior is a very informative geometry clue in monocular 3D object detection (M3OD). However, it has been neglected by most mainstream methods. In this paper, we identify two key factors that limit the applicability of ground plane prior: the projection point localization issue and the ground plane tilt issue. To pick up the ground plane prior for M3OD, we propose a Ground Plane Enhanced Network (GPENet) which resolves both issues at one go. For the projection point localization issue, instead of using the bottom vertices or bottom center of the 3D bounding box (BBox), we leverage the object's ground contact points, which are explicit pixels in the image and easy for the neural network to detect. For the ground plane tilt problem, our GPENet estimates the horizon line in the image and derives a novel mathematical expression to accurately estimate the ground plane equation. An unsupervised vertical edge mining algorithm is also proposed to address the occlusion of the horizon line. Furthermore, we design a novel 3D bounding box deduction method based on a dynamic back projection algorithm, which could take advantage of the accurate contact points and the ground plane equation. Additionally, using only M3OD labels, contact point and horizon line pseudo labels can be easily generated with NO extra data collection and label annotation cost. Extensive experiments on the popular KITTI benchmark show that our GPENet can outperform other methods and achieve state-of-the-art performance, well demonstrating the effectiveness and the superiority of the proposed approach. Moreover, our GPENet works better than other methods in cross-dataset evaluation on the nuScenes dataset. Our code and models will be published.Comment: 13 pages, 10 figure

    Freehand 2D Ultrasound Probe Calibration for Image Fusion with 3D MRI/CT

    Full text link
    The aim of this work is to implement a simple freehand ultrasound (US) probe calibration technique. This will enable us to visualize US image data during surgical procedures using augmented reality. The performance of the system was evaluated with different experiments using two different pose estimation techniques. A near-millimeter accuracy can be achieved with the proposed approach. The developed system is cost-effective, simple and rapid with low calibration erro

    Constant Velocity Constraints for Self-Supervised Monocular Depth Estimation

    Get PDF
    We present a new method for self-supervised monocular depth estimation. Contemporary monocular depth estimation methods use a triplet of consecutive video frames to estimate the central depth image. We make the assumption that the ego-centric view progresses linearly in the scene, based on the kinematic and physical properties of the camera. During the training phase, we can exploit this assumption to create a depth estimation for each image in the triplet. We then apply a new geometry constraint that supports novel synthetic views, thus providing a strong supervisory signal. Our contribution is simple to implement, requires no additional trainable parameter, and produces competitive results when compared with other state-of-the-art methods on the popular KITTI corpus

    Image motion estimation for 3D model based video conferencing.

    Get PDF
    Cheung Man-kin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 116-120).Abstracts in English and Chinese.Chapter 1) --- Introduction --- p.1Chapter 1.1) --- Building of the 3D Wireframe and Facial Model --- p.2Chapter 1.2) --- Description of 3D Model Based Video Conferencing --- p.3Chapter 1.3) --- Wireframe Model Fitting or Conformation --- p.6Chapter 1.4) --- Pose Estimation --- p.8Chapter 1.5) --- Facial Motion Estimation and Synthesis --- p.9Chapter 1.6) --- Thesis Outline --- p.10Chapter 2) --- Wireframe model Fitting --- p.11Chapter 2.1) --- Algorithm of WFM Fitting --- p.12Chapter 2.1.1) --- Global Deformation --- p.14Chapter a) --- Scaling --- p.14Chapter b) --- Shifting --- p.15Chapter 2.1.2) --- Local Deformation --- p.15Chapter a) --- Shifting --- p.16Chapter b) --- Scaling --- p.17Chapter 2.1.3) --- Fine Updating --- p.17Chapter 2.2) --- Steps of Fitting --- p.18Chapter 2.3) --- Functions of Different Deformation --- p.18Chapter 2.4) --- Experimental Results --- p.19Chapter 2.4.1) --- Output wireframe in each step --- p.19Chapter 2.4.2) --- Examples of Mis-fitted wireframe with incoming image --- p.22Chapter 2.4.3) --- Fitted 3D facial wireframe --- p.23Chapter 2.4.4) --- Effect of mis-fitted wireframe after compensation of motion --- p.24Chapter 2.5) --- Summary --- p.26Chapter 3) --- Epipolar Geometry --- p.27Chapter 3.1) --- Pinhole Camera Model and Perspective Projection --- p.28Chapter 3.2) --- Concepts in Epipolar Geometry --- p.31Chapter 3.2.1) --- Working with normalized image coordinates --- p.33Chapter 3.2.2) --- Working with pixel image coordinates --- p.35Chapter 3.2.3) --- Summary --- p.37Chapter 3.3) --- 8-point Algorithm (Essential and Fundamental Matrix) --- p.38Chapter 3.3.1) --- Outline of the 8-point algorithm --- p.38Chapter 3.3.2) --- Modification on obtained Fundamental Matrix --- p.39Chapter 3.3.3) --- Transformation of Image Coordinates --- p.40Chapter a) --- Translation to mean of points --- p.40Chapter b) --- Normalizing transformation --- p.41Chapter 3.3.4) --- Summary of 8-point algorithm --- p.41Chapter 3.4) --- Estimation of Object Position by Decomposition of Essential Matrix --- p.43Chapter 3.4.1) --- Algorithm Derivation --- p.43Chapter 3.4.2) --- Algorithm Outline --- p.46Chapter 3.5) --- Noise Sensitivity --- p.48Chapter 3.5.1) --- Rotation vector of model --- p.48Chapter 3.5.2) --- The projection of rotated model --- p.49Chapter 3.5.3) --- Noisy image --- p.51Chapter 3.5.4) --- Summary --- p.51Chapter 4) --- Pose Estimation --- p.54Chapter 4.1) --- Linear Method --- p.55Chapter 4.1.1) --- Theory --- p.55Chapter 4.1.2) --- Normalization --- p.57Chapter 4.1.3) --- Experimental Results --- p.58Chapter a) --- Synthesized image by linear method without normalization --- p.58Chapter b) --- Performance between linear method with and without normalization --- p.60Chapter c) --- Performance of linear method under quantization noise with different transformation components --- p.62Chapter d) --- Performance of normalized case without transformation in z- component --- p.63Chapter 4.1.4) --- Summary --- p.64Chapter 4.2) --- Two Stage Algorithm --- p.66Chapter 4.2.1) --- Introduction --- p.66Chapter 4.2.2) --- The Two Stage Algorithm --- p.67Chapter a) --- Stage 1 (Iterative Method) --- p.68Chapter b) --- Stage 2 ( Non-linear Optimization) --- p.71Chapter 4.2.3) --- Summary of the Two Stage Algorithm --- p.72Chapter 4.2.4) --- Experimental Results --- p.72Chapter 4.2.5) --- Summary --- p.80Chapter 5) --- Facial Motion Estimation and Synthesis --- p.81Chapter 5.1) --- Facial Expression based on face muscles --- p.83Chapter 5.1.1) --- Review of Action Unit Approach --- p.83Chapter 5.1.2) --- Distribution of Motion Unit --- p.85Chapter 5.1.3) --- Algorithm --- p.89Chapter a) --- For Unidirectional Motion Unit --- p.89Chapter b) --- For Circular Motion Unit (eyes) --- p.90Chapter c) --- For Another Circular Motion Unit (mouth) --- p.90Chapter 5.1.4) --- Experimental Results --- p.91Chapter 5.1.5) --- Summary --- p.95Chapter 5.2) --- Detection of Facial Expression by Muscle-based Approach --- p.96Chapter 5.2.1) --- Theory --- p.96Chapter 5.2.2) --- Algorithm --- p.97Chapter a) --- For Sheet Muscle --- p.97Chapter b) --- For Circular Muscle --- p.98Chapter c) --- For Mouth Muscle --- p.99Chapter 5.2.3) --- Steps of Algorithm --- p.100Chapter 5.2.4) --- Experimental Results --- p.101Chapter 5.2.5) --- Summary --- p.103Chapter 6) --- Conclusion --- p.104Chapter 6.1) --- WFM fitting --- p.104Chapter 6.2) --- Pose Estimation --- p.105Chapter 6.3) --- Facial Estimation and Synthesis --- p.106Chapter 6.4) --- Discussion on Future Improvements --- p.107Chapter 6.4.1) --- WFM Fitting --- p.107Chapter 6.4.2) --- Pose Estimation --- p.109Chapter 6.4.3) --- Facial Motion Estimation and Synthesis --- p.110Chapter 7) --- Appendix --- p.111Chapter 7.1) --- Newton's Method or Newton-Raphson Method --- p.111Chapter 7.2) --- H.261 --- p.113Chapter 7.3) --- 3D Measurement --- p.114Bibliography --- p.11
    • …
    corecore