253 research outputs found
Revisiting Rolling Shutter Bundle Adjustment: Toward Accurate and Fast Solution
We propose a robust and fast bundle adjustment solution that estimates the
6-DoF pose of the camera and the geometry of the environment based on
measurements from a rolling shutter (RS) camera. This tackles the challenges in
the existing works, namely relying on additional sensors, high frame rate video
as input, restrictive assumptions on camera motion, readout direction, and poor
efficiency. To this end, we first investigate the influence of normalization to
the image point on RSBA performance and show its better approximation in
modelling the real 6-DoF camera motion. Then we present a novel analytical
model for the visual residual covariance, which can be used to standardize the
reprojection error during the optimization, consequently improving the overall
accuracy. More importantly, the combination of normalization and covariance
standardization weighting in RSBA (NW-RSBA) can avoid common planar degeneracy
without needing to constrain the filming manner. Besides, we propose an
acceleration strategy for NW-RSBA based on the sparsity of its Jacobian matrix
and Schur complement. The extensive synthetic and real data experiments verify
the effectiveness and efficiency of the proposed solution over the
state-of-the-art works. We also demonstrate the proposed method can be easily
implemented and plug-in famous GSSfM and GSSLAM systems as completed RSSfM and
RSSLAM solutions
Motion stereo at sea: Dense 3D reconstruction from image sequences monitoring conveyor systems on board fishing vessels
A system that reconstructs 3D models from a single camera monitoring fish transported on a conveyor system is investigated. Models are subsequently used for training a species classifier and for improving estimates of discarded biomass. It is demonstrated that a monocular camera, combined with a conveyor's linear motion produces a constrained form of multiview structure from motion, that allows the 3D scene to be reconstructed using a conventional stereo pipeline analogous to that of a binocular camera. Although motion stereo was proposed several decades ago, the present work is the first to compare the accuracy and precision of monocular and binocular stereo cameras monitoring conveyors and operationally deploy a system. The system exploits Convolutional Neural Networks (CNNs) for foreground segmentation and stereo matching. Results from a laboratory model show that when the camera is mounted 750 mm above the conveyor, a median accuracy of <5 mm can be achieved with an equivalent baseline of 62 mm. The precision is largely limited by error in determining the equivalent baseline (i.e. distance travelled by the conveyor belt). When ArUco markers are placed on the belt, the inter quartile range (IQR) of error in z (depth) near the optical centre was found to be ±4 mm
Exploring Sparse, Unstructured Video Collections of Places
The abundance of mobile devices and digital cameras with video capture makes it easy to obtain large collections of video clips that contain the same location, environment, or event. However, such an unstructured collection is difficult to comprehend and explore. We propose a system that analyses collections of unstructured but related video data to create a Videoscape: a data structure that enables interactive exploration of video collections by visually navigating — spatially and/or temporally — between different clips. We automatically identify transition opportunities, or portals. From these portals, we construct the Videoscape, a graph whose edges are video clips and whose nodes are portals between clips. Now structured, the videos can be interactively explored by walking the graph or by geographic map. Given this system, we gauge preference for different video transition styles in a user study, and generate heuristics that automatically choose an appropriate transition style. We evaluate our system using three further user studies, which allows us to conclude that Videoscapes provides significant benefits over related methods. Our system leads to previously unseen ways of interactive spatio-temporal exploration of casually captured videos, and we demonstrate this on several video collections
Recommended from our members
End to End Learning in Autonomous Driving Systems
Convolutional neural networks have advanced visual perception significantly in recent years. Two major ingredients that enable such a success are the composition of simple modules into a complex network and the end to end optimization. However, such success has not yet revolutionized robotics as much as vision, even if robotics suffer from similar problems as traditional computer vision, i.e. imperfectness of the manual pipeline design of the system. This thesis investigates using end-to-end learning for the autonomous driving system, a concrete robotic application. End to end learning can produce reasonable driving behaviors, even in the complex urban driving scenarios. Representation learning in end-to-end driving models is crucial, and auxiliary vision tasks such as semantic segmentation can help to form a more informative driving representation especially when training data is limited. Naive convolutional neural networks are usually only capable of doing reactive control and can not involve complex reasoning in a particular scenario. This thesis also studies how to handle scene conditioned driving behavior, which goes beyond the capability of reactive control. Alongside the end-to-end structure, learning methods also play a critical role. Imitation learning methods will acquire meaningful behaviors but usually, the robot can not master the skill. Reinforcement learning, on the contrary, either barely learns anything if the environment is too complex, or it can master the skill otherwise. To get the best of both worlds, this thesis proposes an algorithmically unified method to learn from both demonstration data and the environment
Light-scattering reconstruction of transparent shapes using neural networks
We propose a cheap non-intrusive high-resolution method of visualising
transparent or translucent objects which may translate, rotate and shapeshift.
We propose a method of reconstructing a strongly deformed time-evolving surface
from a time-series of noisy clouds of points using a lightweight neural
network. We benchmark the method against three different geometries and varying
levels of noise and find that the Gaussian curvature is accurately recovered
when the noise level is below of the diameter of the surface and the data
from distinct regions of the surface do not overlap
- …