78 research outputs found
Lifting GIS Maps into Strong Geometric Context for Scene Understanding
Contextual information can have a substantial impact on the performance of
visual tasks such as semantic segmentation, object detection, and geometric
estimation. Data stored in Geographic Information Systems (GIS) offers a rich
source of contextual information that has been largely untapped by computer
vision. We propose to leverage such information for scene understanding by
combining GIS resources with large sets of unorganized photographs using
Structure from Motion (SfM) techniques. We present a pipeline to quickly
generate strong 3D geometric priors from 2D GIS data using SfM models aligned
with minimal user input. Given an image resectioned against this model, we
generate robust predictions of depth, surface normals, and semantic labels. We
show that the precision of the predicted geometry is substantially more
accurate other single-image depth estimation methods. We then demonstrate the
utility of these contextual constraints for re-scoring pedestrian detections,
and use these GIS contextual features alongside object detection score maps to
improve a CRF-based semantic segmentation framework, boosting accuracy over
baseline models
Parallelized computational 3D video microscopy of freely moving organisms at multiple gigapixels per second
To study the behavior of freely moving model organisms such as zebrafish
(Danio rerio) and fruit flies (Drosophila) across multiple spatial scales, it
would be ideal to use a light microscope that can resolve 3D information over a
wide field of view (FOV) at high speed and high spatial resolution. However, it
is challenging to design an optical instrument to achieve all of these
properties simultaneously. Existing techniques for large-FOV microscopic
imaging and for 3D image measurement typically require many sequential image
snapshots, thus compromising speed and throughput. Here, we present 3D-RAPID, a
computational microscope based on a synchronized array of 54 cameras that can
capture high-speed 3D topographic videos over a 135-cm^2 area, achieving up to
230 frames per second at throughputs exceeding 5 gigapixels (GPs) per second.
3D-RAPID features a 3D reconstruction algorithm that, for each synchronized
temporal snapshot, simultaneously fuses all 54 images seamlessly into a
globally-consistent composite that includes a coregistered 3D height map. The
self-supervised 3D reconstruction algorithm itself trains a
spatiotemporally-compressed convolutional neural network (CNN) that maps raw
photometric images to 3D topography, using stereo overlap redundancy and
ray-propagation physics as the only supervision mechanism. As a result, our
end-to-end 3D reconstruction algorithm is robust to generalization errors and
scales to arbitrarily long videos from arbitrarily sized camera arrays. The
scalable hardware and software design of 3D-RAPID addresses a longstanding
problem in the field of behavioral imaging, enabling parallelized 3D
observation of large collections of freely moving organisms at high
spatiotemporal throughputs, which we demonstrate in ants (Pogonomyrmex
barbatus), fruit flies, and zebrafish larvae
Ground Plane Matters: Picking Up Ground Plane Prior in Monocular 3D Object Detection
The ground plane prior is a very informative geometry clue in monocular 3D
object detection (M3OD). However, it has been neglected by most mainstream
methods. In this paper, we identify two key factors that limit the
applicability of ground plane prior: the projection point localization issue
and the ground plane tilt issue. To pick up the ground plane prior for M3OD, we
propose a Ground Plane Enhanced Network (GPENet) which resolves both issues at
one go. For the projection point localization issue, instead of using the
bottom vertices or bottom center of the 3D bounding box (BBox), we leverage the
object's ground contact points, which are explicit pixels in the image and easy
for the neural network to detect. For the ground plane tilt problem, our GPENet
estimates the horizon line in the image and derives a novel mathematical
expression to accurately estimate the ground plane equation. An unsupervised
vertical edge mining algorithm is also proposed to address the occlusion of the
horizon line. Furthermore, we design a novel 3D bounding box deduction method
based on a dynamic back projection algorithm, which could take advantage of the
accurate contact points and the ground plane equation. Additionally, using only
M3OD labels, contact point and horizon line pseudo labels can be easily
generated with NO extra data collection and label annotation cost. Extensive
experiments on the popular KITTI benchmark show that our GPENet can outperform
other methods and achieve state-of-the-art performance, well demonstrating the
effectiveness and the superiority of the proposed approach. Moreover, our
GPENet works better than other methods in cross-dataset evaluation on the
nuScenes dataset. Our code and models will be published.Comment: 13 pages, 10 figure
Freehand 2D Ultrasound Probe Calibration for Image Fusion with 3D MRI/CT
The aim of this work is to implement a simple freehand ultrasound (US) probe
calibration technique. This will enable us to visualize US image data during
surgical procedures using augmented reality. The performance of the system was
evaluated with different experiments using two different pose estimation
techniques. A near-millimeter accuracy can be achieved with the proposed
approach. The developed system is cost-effective, simple and rapid with low
calibration erro
Constant Velocity Constraints for Self-Supervised Monocular Depth Estimation
We present a new method for self-supervised monocular depth estimation. Contemporary monocular depth estimation methods use a triplet of consecutive video frames to estimate the central depth image. We make the assumption that the ego-centric view progresses linearly in the scene, based on the kinematic and physical properties of the camera. During the training phase, we can exploit this assumption to create a depth estimation for each image in the triplet. We then apply a new geometry constraint that supports novel synthetic views, thus providing a strong supervisory signal. Our contribution is simple to implement, requires no additional trainable parameter, and produces competitive results when compared with other state-of-the-art methods on the popular KITTI corpus
Image motion estimation for 3D model based video conferencing.
Cheung Man-kin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 116-120).Abstracts in English and Chinese.Chapter 1) --- Introduction --- p.1Chapter 1.1) --- Building of the 3D Wireframe and Facial Model --- p.2Chapter 1.2) --- Description of 3D Model Based Video Conferencing --- p.3Chapter 1.3) --- Wireframe Model Fitting or Conformation --- p.6Chapter 1.4) --- Pose Estimation --- p.8Chapter 1.5) --- Facial Motion Estimation and Synthesis --- p.9Chapter 1.6) --- Thesis Outline --- p.10Chapter 2) --- Wireframe model Fitting --- p.11Chapter 2.1) --- Algorithm of WFM Fitting --- p.12Chapter 2.1.1) --- Global Deformation --- p.14Chapter a) --- Scaling --- p.14Chapter b) --- Shifting --- p.15Chapter 2.1.2) --- Local Deformation --- p.15Chapter a) --- Shifting --- p.16Chapter b) --- Scaling --- p.17Chapter 2.1.3) --- Fine Updating --- p.17Chapter 2.2) --- Steps of Fitting --- p.18Chapter 2.3) --- Functions of Different Deformation --- p.18Chapter 2.4) --- Experimental Results --- p.19Chapter 2.4.1) --- Output wireframe in each step --- p.19Chapter 2.4.2) --- Examples of Mis-fitted wireframe with incoming image --- p.22Chapter 2.4.3) --- Fitted 3D facial wireframe --- p.23Chapter 2.4.4) --- Effect of mis-fitted wireframe after compensation of motion --- p.24Chapter 2.5) --- Summary --- p.26Chapter 3) --- Epipolar Geometry --- p.27Chapter 3.1) --- Pinhole Camera Model and Perspective Projection --- p.28Chapter 3.2) --- Concepts in Epipolar Geometry --- p.31Chapter 3.2.1) --- Working with normalized image coordinates --- p.33Chapter 3.2.2) --- Working with pixel image coordinates --- p.35Chapter 3.2.3) --- Summary --- p.37Chapter 3.3) --- 8-point Algorithm (Essential and Fundamental Matrix) --- p.38Chapter 3.3.1) --- Outline of the 8-point algorithm --- p.38Chapter 3.3.2) --- Modification on obtained Fundamental Matrix --- p.39Chapter 3.3.3) --- Transformation of Image Coordinates --- p.40Chapter a) --- Translation to mean of points --- p.40Chapter b) --- Normalizing transformation --- p.41Chapter 3.3.4) --- Summary of 8-point algorithm --- p.41Chapter 3.4) --- Estimation of Object Position by Decomposition of Essential Matrix --- p.43Chapter 3.4.1) --- Algorithm Derivation --- p.43Chapter 3.4.2) --- Algorithm Outline --- p.46Chapter 3.5) --- Noise Sensitivity --- p.48Chapter 3.5.1) --- Rotation vector of model --- p.48Chapter 3.5.2) --- The projection of rotated model --- p.49Chapter 3.5.3) --- Noisy image --- p.51Chapter 3.5.4) --- Summary --- p.51Chapter 4) --- Pose Estimation --- p.54Chapter 4.1) --- Linear Method --- p.55Chapter 4.1.1) --- Theory --- p.55Chapter 4.1.2) --- Normalization --- p.57Chapter 4.1.3) --- Experimental Results --- p.58Chapter a) --- Synthesized image by linear method without normalization --- p.58Chapter b) --- Performance between linear method with and without normalization --- p.60Chapter c) --- Performance of linear method under quantization noise with different transformation components --- p.62Chapter d) --- Performance of normalized case without transformation in z- component --- p.63Chapter 4.1.4) --- Summary --- p.64Chapter 4.2) --- Two Stage Algorithm --- p.66Chapter 4.2.1) --- Introduction --- p.66Chapter 4.2.2) --- The Two Stage Algorithm --- p.67Chapter a) --- Stage 1 (Iterative Method) --- p.68Chapter b) --- Stage 2 ( Non-linear Optimization) --- p.71Chapter 4.2.3) --- Summary of the Two Stage Algorithm --- p.72Chapter 4.2.4) --- Experimental Results --- p.72Chapter 4.2.5) --- Summary --- p.80Chapter 5) --- Facial Motion Estimation and Synthesis --- p.81Chapter 5.1) --- Facial Expression based on face muscles --- p.83Chapter 5.1.1) --- Review of Action Unit Approach --- p.83Chapter 5.1.2) --- Distribution of Motion Unit --- p.85Chapter 5.1.3) --- Algorithm --- p.89Chapter a) --- For Unidirectional Motion Unit --- p.89Chapter b) --- For Circular Motion Unit (eyes) --- p.90Chapter c) --- For Another Circular Motion Unit (mouth) --- p.90Chapter 5.1.4) --- Experimental Results --- p.91Chapter 5.1.5) --- Summary --- p.95Chapter 5.2) --- Detection of Facial Expression by Muscle-based Approach --- p.96Chapter 5.2.1) --- Theory --- p.96Chapter 5.2.2) --- Algorithm --- p.97Chapter a) --- For Sheet Muscle --- p.97Chapter b) --- For Circular Muscle --- p.98Chapter c) --- For Mouth Muscle --- p.99Chapter 5.2.3) --- Steps of Algorithm --- p.100Chapter 5.2.4) --- Experimental Results --- p.101Chapter 5.2.5) --- Summary --- p.103Chapter 6) --- Conclusion --- p.104Chapter 6.1) --- WFM fitting --- p.104Chapter 6.2) --- Pose Estimation --- p.105Chapter 6.3) --- Facial Estimation and Synthesis --- p.106Chapter 6.4) --- Discussion on Future Improvements --- p.107Chapter 6.4.1) --- WFM Fitting --- p.107Chapter 6.4.2) --- Pose Estimation --- p.109Chapter 6.4.3) --- Facial Motion Estimation and Synthesis --- p.110Chapter 7) --- Appendix --- p.111Chapter 7.1) --- Newton's Method or Newton-Raphson Method --- p.111Chapter 7.2) --- H.261 --- p.113Chapter 7.3) --- 3D Measurement --- p.114Bibliography --- p.11
- …