31 research outputs found
Formalization of the General Video Temporal Synchronization Problem
In this work, we present a theoretical formalization of the temporal synchronization problem and a method to temporally synchronize multiple stationary video cameras with overlapping views of the same scene. The method uses a two stage approach that first approximates the synchronization by tracking moving objects and identifying curvature points. The method then proceeds to refine the estimate using a consensus based matching heuristic to find frames that best agree with the pre-computed camera geometries from stationary background image features. By using the fundamental matrix and the trifocal tensor in the second refinement step, we improve the estimation of the first step and handle a broader more generic range of input scenarios and camera conditions. The method is relatively simple compared to current techniques and is no harder than feature tracking in stage one and computing accurate geometries in stage two. We also provide a robust method to assist synchronization in the presence of inaccurate geometry computation, and a theoretical limit on the accuracy that can be expected from any synchronization syste
What You See Is What You Detect: Towards better Object Densification in 3D detection
Recent works have demonstrated the importance of object completion in 3D
Perception from Lidar signal. Several methods have been proposed in which
modules were used to densify the point clouds produced by laser scanners,
leading to better recall and more accurate results. Pursuing in that direction,
we present, in this work, a counter-intuitive perspective: the widely-used
full-shape completion approach actually leads to a higher error-upper bound
especially for far away objects and small objects like pedestrians. Based on
this observation, we introduce a visible part completion method that requires
only 11.3\% of the prediction points that previous methods generate. To recover
the dense representation, we propose a mesh-deformation-based method to augment
the point set associated with visible foreground objects. Considering that our
approach focuses only on the visible part of the foreground objects to achieve
accurate 3D detection, we named our method What You See Is What You Detect
(WYSIWYD). Our proposed method is thus a detector-independent model that
consists of 2 parts: an Intra-Frustum Segmentation Transformer (IFST) and a
Mesh Depth Completion Network(MDCNet) that predicts the foreground depth from
mesh deformation. This way, our model does not require the time-consuming
full-depth completion task used by most pseudo-lidar-based methods. Our
experimental evaluation shows that our approach can provide up to 12.2\%
performance improvements over most of the public baseline models on the KITTI
and NuScenes dataset bringing the state-of-the-art to a new level. The codes
will be available at
\textcolor[RGB]{0,0,255}{\url{{https://github.com/Orbis36/WYSIWYD}
Feature Based Cut Detection with Automatic Threshold Selection
There has been much work concentrated on creating accurate shot boundary detection algorithms in recent years. However a truly accurate method of cut detection still eludes researchers in general. In this work we present a scheme based on stable feature tracking for inter frame differencing. Furthermore, we present a method to stabilize the differences and automatically detect a global threshold to achieve a high detection rate. We compare our scheme against other cut detection techniques on a variety of data sources that have been specifically selected because of the difficulties they present due to quick motion, highly edited sequences and computer-generated effects