685 research outputs found
Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency
We present an end-to-end joint training framework that explicitly models
6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular
camera setup without supervision. Our technical contributions are three-fold.
First, we highlight the fundamental difference between inverse and forward
projection while modeling the individual motion of each rigid object, and
propose a geometrically correct projection pipeline using a neural forward
projection module. Second, we design a unified instance-aware photometric and
geometric consistency loss that holistically imposes self-supervisory signals
for every background and object region. Lastly, we introduce a general-purpose
auto-annotation scheme using any off-the-shelf instance segmentation and
optical flow models to produce video instance segmentation maps that will be
utilized as input to our training pipeline. These proposed elements are
validated in a detailed ablation study. Through extensive experiments conducted
on the KITTI and Cityscapes dataset, our framework is shown to outperform the
state-of-the-art depth and motion estimation methods. Our code, dataset, and
models are available at https://github.com/SeokjuLee/Insta-DM .Comment: Accepted to AAAI 2021. Code/dataset/models are available at
https://github.com/SeokjuLee/Insta-DM. arXiv admin note: substantial text
overlap with arXiv:1912.0935
Fast Multi-frame Stereo Scene Flow with Motion Segmentation
We propose a new multi-frame method for efficiently computing scene flow
(dense depth and optical flow) and camera ego-motion for a dynamic scene
observed from a moving stereo camera rig. Our technique also segments out
moving objects from the rigid scene. In our method, we first estimate the
disparity map and the 6-DOF camera motion using stereo matching and visual
odometry. We then identify regions inconsistent with the estimated camera
motion and compute per-pixel optical flow only at these regions. This flow
proposal is fused with the camera motion-based flow proposal using fusion moves
to obtain the final optical flow and motion segmentation. This unified
framework benefits all four tasks - stereo, optical flow, visual odometry and
motion segmentation leading to overall higher accuracy and efficiency. Our
method is currently ranked third on the KITTI 2015 scene flow benchmark.
Furthermore, our CPU implementation runs in 2-3 seconds per frame which is 1-3
orders of magnitude faster than the top six methods. We also report a thorough
evaluation on challenging Sintel sequences with fast camera and object motion,
where our method consistently outperforms OSF [Menze and Geiger, 2015], which
is currently ranked second on the KITTI benchmark.Comment: 15 pages. To appear at IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2017). Our results were submitted to KITTI 2015 Stereo
Scene Flow Benchmark in November 201
Normalized Cuts and Image Segmentation
We propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different groups as well as the total similarity within the groups. We show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this criterion. We have applied this approach to segmenting static images, as well as motion sequences, and found the results to be very encouraging
A blind deconvolution approach to recover effective connectivity brain networks from resting state fMRI data
A great improvement to the insight on brain function that we can get from
fMRI data can come from effective connectivity analysis, in which the flow of
information between even remote brain regions is inferred by the parameters of
a predictive dynamical model. As opposed to biologically inspired models, some
techniques as Granger causality (GC) are purely data-driven and rely on
statistical prediction and temporal precedence. While powerful and widely
applicable, this approach could suffer from two main limitations when applied
to BOLD fMRI data: confounding effect of hemodynamic response function (HRF)
and conditioning to a large number of variables in presence of short time
series. For task-related fMRI, neural population dynamics can be captured by
modeling signal dynamics with explicit exogenous inputs; for resting-state fMRI
on the other hand, the absence of explicit inputs makes this task more
difficult, unless relying on some specific prior physiological hypothesis. In
order to overcome these issues and to allow a more general approach, here we
present a simple and novel blind-deconvolution technique for BOLD-fMRI signal.
Coming to the second limitation, a fully multivariate conditioning with short
and noisy data leads to computational problems due to overfitting. Furthermore,
conceptual issues arise in presence of redundancy. We thus apply partial
conditioning to a limited subset of variables in the framework of information
theory, as recently proposed. Mixing these two improvements we compare the
differences between BOLD and deconvolved BOLD level effective networks and draw
some conclusions
- …