38,590 research outputs found
Prediction and Tracking of Moving Objects in Image Sequences
We employ a prediction model for moving object velocity and location estimation derived from Bayesian theory. The optical flow of a certain moving object depends on the history of its previous values. A joint optical flow estimation and moving object segmentation algorithm is used for the initialization of the tracking algorithm. The segmentation of the moving objects is determined by appropriately classifying the unlabeled and the occluding regions. Segmentation and optical flow tracking is used for predicting future frames
Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow
A major challenge for video semantic segmentation is the lack of labeled
data. In most benchmark datasets, only one frame of a video clip is annotated,
which makes most supervised methods fail to utilize information from the rest
of the frames. To exploit the spatio-temporal information in videos, many
previous works use pre-computed optical flows, which encode the temporal
consistency to improve the video segmentation. However, the video segmentation
and optical flow estimation are still considered as two separate tasks. In this
paper, we propose a novel framework for joint video semantic segmentation and
optical flow estimation. Semantic segmentation brings semantic information to
handle occlusion for more robust optical flow estimation, while the
non-occluded optical flow provides accurate pixel-level temporal
correspondences to guarantee the temporal consistency of the segmentation.
Moreover, our framework is able to utilize both labeled and unlabeled frames in
the video through joint training, while no additional calculation is required
in inference. Extensive experiments show that the proposed model makes the
video semantic segmentation and optical flow estimation benefit from each other
and outperforms existing methods under the same settings in both tasks.Comment: Published in AAAI 202
Joint Optical Flow and Temporally Consistent Semantic Segmentation
The importance and demands of visual scene understanding have been steadily
increasing along with the active development of autonomous systems.
Consequently, there has been a large amount of research dedicated to semantic
segmentation and dense motion estimation. In this paper, we propose a method
for jointly estimating optical flow and temporally consistent semantic
segmentation, which closely connects these two problem domains and leverages
each other. Semantic segmentation provides information on plausible physical
motion to its associated pixels, and accurate pixel-level temporal
correspondences enhance the accuracy of semantic segmentation in the temporal
domain. We demonstrate the benefits of our approach on the KITTI benchmark,
where we observe performance gains for flow and segmentation. We achieve
state-of-the-art optical flow results, and outperform all published algorithms
by a large margin on challenging, but crucial dynamic objects.Comment: 14 pages, Accepted for CVRSUAD workshop at ECCV 201
Three dimensional transparent structure segmentation and multiple 3D motion estimation from monocular perspective image sequences
A three dimensional scene can be segmented using different cues, such as boundaries, texture, motion, discontinuities of the optical flow, stereo, models for structure, etc. We investigate segmentation based upon one of these cues, namely three dimensional motion. If the scene contain transparent objects, the two dimensional (local) cues are inconsistent, since neighboring points with similar optical flow can correspond to different objects. We present a method for performing three dimensional motion-based segmentation of (possibly) transparent scenes together with recursive estimation of the motion of each independent rigid object from monocular perspective images. Our algorithm is based on a recently proposed method for rigid motion reconstruction and a validation test which allows us to initialize the scheme and detect outliers during the motion estimation procedure. The scheme is tested on challenging real and synthetic image sequences. Segmentation is performed for the Ullmann's experiment of two transparent cylinders rotating about the same axis in opposite directions
Optical Flow in Mostly Rigid Scenes
The optical flow of natural scenes is a combination of the motion of the
observer and the independent motion of objects. Existing algorithms typically
focus on either recovering motion and structure under the assumption of a
purely static world or optical flow for general unconstrained scenes. We
combine these approaches in an optical flow algorithm that estimates an
explicit segmentation of moving objects from appearance and physical
constraints. In static regions we take advantage of strong constraints to
jointly estimate the camera motion and the 3D structure of the scene over
multiple frames. This allows us to also regularize the structure instead of the
motion. Our formulation uses a Plane+Parallax framework, which works even under
small baselines, and reduces the motion estimation to a one-dimensional search
problem, resulting in more accurate estimation. In moving regions the flow is
treated as unconstrained, and computed with an existing optical flow method.
The resulting Mostly-Rigid Flow (MR-Flow) method achieves state-of-the-art
results on both the MPI-Sintel and KITTI-2015 benchmarks.Comment: 15 pages, 10 figures; accepted for publication at CVPR 201
Detection and segmentation of moving objects in video using optical vector flow estimation
The objective of this thesis is to detect and identify moving objects in a video sequence. The currently available techniques for motion estimation can be broadly categorized into two main classes: block matching methods and optical flow methods.This thesis investigates the different motion estimation algorithms used for video processing applications. Among the available motion estimation methods, the Lucas Kanade Optical Flow Algorithm has been used in this thesis for detection of moving objects in a video sequence. Derivatives of image brightness with respect to x-direction, y-direction and time t are calculated to solve the Optical Flow Constraint Equation. The algorithm produces results in the form of horizontal and vertical components of optical flow velocity, u and v respectively. This optical flow velocity is measured in the form of vectors and has been used to segment the moving objects from the video sequence. The algorithm has been applied to different sets of synthetic and real video sequences.This method has been modified to include parameters such as neighborhood size and Gaussian pyramid filtering which improve the motion estimation process. The concept of Gaussian pyramids has been used to simplify the complex video sequences and the optical flow algorithm has been applied to different levels of pyramids. The estimated motion derived from the difference in the optical flow vectors for moving objects and stationary background has been used to segment the moving objects in the video sequences. A combination of erosion and dilation techniques is then used to improve the quality of already segmented content.The Lucas Kanade Optical Flow Algorithm along with other considered parameters produces encouraging motion estimation and segmentation results. The consistency of the algorithm has been tested by the usage of different types of motion and video sequences. Other contributions of this thesis also include a comparative analysis of the optical flow algorithm with other existing motion estimation and segmentation techniques. The comparison shows that there is need to achieve a balance between accuracy and computational speed for the implementation of any motion estimation algorithm in real time for video surveillance
SENSE: a Shared Encoder Network for Scene-flow Estimation
We introduce a compact network for holistic scene flow estimation, called
SENSE, which shares common encoder features among four closely-related tasks:
optical flow estimation, disparity estimation from stereo, occlusion
estimation, and semantic segmentation. Our key insight is that sharing features
makes the network more compact, induces better feature representations, and can
better exploit interactions among these tasks to handle partially labeled data.
With a shared encoder, we can flexibly add decoders for different tasks during
training. This modular design leads to a compact and efficient model at
inference time. Exploiting the interactions among these tasks allows us to
introduce distillation and self-supervised losses in addition to supervised
losses, which can better handle partially labeled real-world data. SENSE
achieves state-of-the-art results on several optical flow benchmarks and runs
as fast as networks specifically designed for optical flow. It also compares
favorably against the state of the art on stereo and scene flow, while
consuming much less memory.Comment: ICCV 2019 Ora
SAMFlow: Eliminating Any Fragmentation in Optical Flow with Segment Anything Model
Optical Flow Estimation aims to find the 2D dense motion field between two
frames. Due to the limitation of model structures and training datasets,
existing methods often rely too much on local clues and ignore the integrity of
objects, resulting in fragmented motion estimation. Through theoretical
analysis, we find the pre-trained large vision models are helpful in optical
flow estimation, and we notice that the recently famous Segment Anything Model
(SAM) demonstrates a strong ability to segment complete objects, which is
suitable for solving the fragmentation problem. We thus propose a solution to
embed the frozen SAM image encoder into FlowFormer to enhance object
perception. To address the challenge of in-depth utilizing SAM in
non-segmentation tasks like optical flow estimation, we propose an Optical Flow
Task-Specific Adaption scheme, including a Context Fusion Module to fuse the
SAM encoder with the optical flow context encoder, and a Context Adaption
Module to adapt the SAM features for optical flow task with Learned
Task-Specific Embedding. Our proposed SAMFlow model reaches 0.86/2.10
clean/final EPE and 3.55/12.32 EPE/F1-all on Sintel and KITTI-15 training set,
surpassing Flowformer by 8.5%/9.9% and 13.2%/16.3%. Furthermore, our model
achieves state-of-the-art performance on the Sintel and KITTI-15 benchmarks,
ranking #1 among all two-frame methods on Sintel clean pass
MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features
Self-supervised learning of visual representations has been focusing on
learning content features, which do not capture object motion or location, and
focus on identifying and differentiating objects in images and videos. On the
other hand, optical flow estimation is a task that does not involve
understanding the content of the images on which it is estimated. We unify the
two approaches and introduce MC-JEPA, a joint-embedding predictive architecture
and self-supervised learning approach to jointly learn optical flow and content
features within a shared encoder, demonstrating that the two associated
objectives; the optical flow estimation objective and the self-supervised
learning objective; benefit from each other and thus learn content features
that incorporate motion information. The proposed approach achieves performance
on-par with existing unsupervised optical flow benchmarks, as well as with
common self-supervised learning approaches on downstream tasks such as semantic
segmentation of images and videos
- …