9 research outputs found
Robust Motion Segmentation from Pairwise Matches
In this paper we address a classification problem that has not been
considered before, namely motion segmentation given pairwise matches only. Our
contribution to this unexplored task is a novel formulation of motion
segmentation as a two-step process. First, motion segmentation is performed on
image pairs independently. Secondly, we combine independent pairwise
segmentation results in a robust way into the final globally consistent
segmentation. Our approach is inspired by the success of averaging methods. We
demonstrate in simulated as well as in real experiments that our method is very
effective in reducing the errors in the pairwise motion segmentation and can
cope with large number of mismatches
DUST: dual union of spatio-temporal subspaces for monocular multiple object 3D reconstruction
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.We present an approach to reconstruct the 3D shape of multiple deforming objects from incomplete 2D trajectories acquired by a single camera. Additionally, we simultaneously provide spatial segmentation (i.e., we identify each of the objects in every frame) and temporal clustering (i.e., we split the sequence into primitive actions). This advances existing work, which only tackled the problem for one single object and non-occluded tracks. In order to handle several objects at a time from partial observations, we model point trajectories as a union of spatial and temporal subspaces, and optimize the parameters of both modalities, the non-observed point tracks and the 3D shape via augmented Lagrange multipliers. The algorithm is fully unsupervised and results in a formulation which does not need initialization. We thoroughly validate the method on challenging scenarios with several human subjects performing different activities which involve complex motions and close interaction. We show our approach achieves state-of-the-art 3D reconstruction results, while it also provides spatial and temporal segmentation.Peer ReviewedPostprint (author's final draft
Learning Dense 3D Models from Monocular Video
Reconstructing dense, detailed, 3D shape of dynamic scenes from monocular sequences is a challenging problem in computer vision. While robust and even real-time solutions exist to this problem if the observed scene is static, for non-rigid dense shape capture current systems are typically restricted to the use of complex multi-camera rigs, taking advantage of the additional depth channel available in RGB-D cameras, or dealing with specific shapes such as faces or planar surfaces. In this thesis, we present two pieces of work for reconstructing dense generic shapes from monocular sequences. In the first work, we propose an unsupervised approach to the challenging problem of simultaneously segmenting the scene into its constituent objects and reconstructing a 3D model of the scene. The strength of our approach comes from the ability to deal with real-world dynamic scenes and to handle seamlessly different types of motion: rigid, articulated and non-rigid. We formulate the problem as a hierarchical graph-cuts based segmentation where we decompose the whole scene into background and foreground objects and model the complex motion of non-rigid or articulated objects as a set of overlapping rigid parts. To validate the capability of our approach to deal with real-world scenes, we provide 3D reconstructions of some challenging videos from the YouTube Objects and KITTI dataset, etc. In the second work, we propose a direct approach for capturing the dense, detailed 3D geometry of generic, complex non-rigid meshes using a single camera. Our method makes use of a single RGB video as input; it can capture the deformations of generic shapes; and the depth estimation is dense, per-pixel and direct. We first reconstruct a dense 3D template of the shape of the object, using a short rigid sequence, and subsequently perform online reconstruction of the non-rigid mesh as it evolves over time. In our experimental evaluation, we show a range of qualitative results on novel datasets and quantitative comparison results with stereo reconstruction
Robust motion segmentation with subspace constraints
Motion segmentation is an important task in computer vision with
many applications such as dynamic scene understanding and
multi-body structure from motion. When the point correspondences
across frames are given, motion segmentation can be addressed as
a subspace clustering problem under an affine camera model. In
the first two parts of this thesis, we target the general
subspace clustering problem and propose two novel methods, namely
Efficient Dense Subspace Clustering (EDSC) and the Robust Shape
Interaction Matrix (RSIM) method.
Instead of following the standard compressive sensing approach,
in EDSC we formulate subspace clustering as a Frobenius norm
minimization problem, which inherently yields denser connections
between data points. While in the noise-free case we rely on the
self-expressiveness of the observations, in the presence of noise
we recover a clean dictionary to represent the data. Our
formulation lets us solve the subspace clustering problem
efficiently. More specifically, for outlier-free observations,
the solution can be obtained in closed-form, and in the presence
of outliers, we solve the problem by performing a series of
linear operations. Furthermore, we show that our Frobenius norm
formulation shares the same solution as the popular nuclear norm
minimization approach when the data is free of any noise.
In RSIM, we revisit the Shape Interaction Matrix (SIM) method,
one of the earliest approaches for motion segmentation (or
subspace clustering), and reveal its connections to several
recent subspace clustering methods. We derive a simple, yet
effective algorithm to robustify the SIM method and make it
applicable to real-world scenarios where the data is corrupted by
noise. We validate the proposed method by intuitive examples and
justify it with the matrix perturbation theory. Moreover, we show
that RSIM can be extended to handle missing data with a
Grassmannian gradient descent method.
The above subspace clustering methods work well for motion
segmentation, yet they require that point trajectories across
frames are known {\it a priori}. However, finding point
correspondences is in itself a challenging task. Existing
approaches tackle the correspondence estimation and motion
segmentation problems separately. In the third part of this
thesis, given a set of feature points detected in each frame of
the sequence, we develop an approach which simultaneously
performs motion segmentation and finds point correspondences
across the frames. We formulate this problem in terms of Partial
Permutation Matrices (PPMs) and aim to match feature descriptors
while simultaneously encouraging point trajectories to satisfy
subspace constraints. This lets us handle outliers in both point
locations and feature appearance. The resulting optimization
problem is solved via the Alternating Direction Method of
Multipliers (ADMM), where each subproblem has an efficient
solution. In particular, we show that most of the subproblems can
be solved in closed-form, and one binary assignment subproblem
can be solved by the Hungarian algorithm.
Obtaining reliable feature tracks in a frame-by-frame manner is
desirable in applications such as online motion segmentation. In
the final part of the thesis, we introduce a novel multi-body
feature tracker that exploits a multi-body rigidity assumption to
improve tracking robustness under a general perspective camera
model. A conventional approach to addressing this problem would
consist of alternating between solving two subtasks: motion
segmentation and feature tracking under rigidity constraints for
each segment. This approach, however, requires knowing the number
of motions, as well as assigning points to motion groups, which
is typically sensitive to motion estimates. By contrast, we
introduce a segmentation-free solution to multi-body feature
tracking that bypasses the motion assignment step and reduces to
solving a series of subproblems with closed-form solutions.
In summary, in this thesis, we exploit the powerful subspace
constraints and develop robust motion segmentation methods in
different challenging scenarios where the trajectories are either
given as input, or unknown beforehand. We also present a general
robust multi-body feature tracker which can be used as the first
step of motion segmentation to get reliable trajectories
People detection and tracking in crowded scenes
People are often a central element of visual scenes, particularly in real-world street scenes. Thus it has been a long-standing goal in Computer Vision to develop methods aiming at analyzing humans in visual data. Due to the complexity of real-world scenes, visual understanding of people remains challenging for machine perception. In this thesis we focus on advancing the techniques for people detection and tracking in crowded street scenes. We also propose new models for human pose estimation and motion segmentation in realistic images and videos. First, we propose detection models that are jointly trained to detect single person as well as pairs of people under varying degrees of occlusion. The learning algorithm of our joint detector facilitates a tight integration of tracking and detection, because it is designed to address common failure cases during tracking due to long-term inter-object occlusions. Second, we propose novel multi person tracking models that formulate tracking as a graph partitioning problem. Our models jointly cluster detection hypotheses in space and time, eliminating the need for a heuristic non-maximum suppression. Furthermore, for crowded scenes, our tracking model encodes long-range person re-identification information into the detection clustering process in a unified and rigorous manner. Third, we explore the visual tracking task in different granularity. We present a tracking model that simultaneously clusters object bounding boxes and pixel level trajectories over time. This approach provides a rich understanding of the motion of objects in the scene. Last, we extend our tracking model for the multi person pose estimation task. We introduce a joint subset partitioning and labelling model where we simultaneously estimate the poses of all the people in the scene. In summary, this thesis addresses a number of diverse tasks that aim to enable vision systems to analyze people in realistic images and videos. In particular, the thesis proposes several novel ideas and rigorous mathematical formulations, pushes the boundary of state-of-the-arts and results in superior performance.Personen sind oft ein zentraler Bestandteil visueller Szenen, besonders in natürlichen Straßenszenen. Daher ist es seit langem ein Ziel der Computer Vision, Methoden zu entwickeln, um Personen in einer Szene zu analysieren. Aufgrund der Komplexität natürlicher Szenen bleibt das visuelle Verständnis von Personen eine Herausforderung für die maschinelle Wahrnehmung. Im Zentrum dieser Arbeit steht die Weiterentwicklung von Verfahren zur Detektion und zum Tracking von Personen in Straßenszenen mit Menschenmengen. Wir erforschen darüber hinaus neue Methoden zur menschlichen Posenschätzung und Bewegungssegmentierung in realistischen Bildern und Videos. Zunächst schlagen wir Detektionsmodelle vor, die gemeinsam trainiert werden, um sowohl einzelne Personen als auch Personenpaare bei verschiedener Verdeckung zu detektieren. Der Lernalgorithmus unseres gemeinsamen Detektors erleichtert eine enge Integration von Tracking und Detektion, da er darauf konzipiert ist, häufige Fehlerfälle aufgrund langfristiger Verdeckungen zwischen Objekten während des Tracking anzugehen. Zweitens schlagen wir neue Modelle für das Tracking mehrerer Personen vor, die das Tracking als Problem der Graphenpartitionierung formulieren. Unsere Mod- elle clustern Detektionshypothesen gemeinsam in Raum und Zeit und eliminieren dadurch die Notwendigkeit einer heuristischen Unterdrückung nicht maximaler De- tektionen. Bei Szenen mit Menschenmengen kodiert unser Trackingmodell darüber hinaus einheitlich und genau Informationen zur langfristigen Re-Identifizierung in den Clusteringprozess der Detektionen. Drittens untersuchen wir die visuelle Trackingaufgabe bei verschiedener Gran- ularität. Wir stellen ein Trackingmodell vor, das im Zeitablauf gleichzeitig Begren- zungsrahmen von Objekten und Trajektorien auf Pixelebene clustert. Diese Herange- hensweise ermöglicht ein umfassendes Verständnis der Bewegung der Objekte in der Szene. Schließlich erweitern wir unser Trackingmodell für die Posenschätzung mehrerer Personen. Wir führen ein Modell zur gemeinsamen Graphzerlegung und Knoten- klassifikation ein, mit dem wir gleichzeitig die Posen aller Personen in der Szene schätzen. Zusammengefasst widmet sich diese Arbeit einer Reihe verschiedener Aufgaben mit dem gemeinsamen Ziel, Bildverarbeitungssystemen die Analyse von Personen in realistischen Bildern und Videos zu ermöglichen. Insbesondere schlägt die Arbeit mehrere neue Ansätze und genaue mathematische Formulierungen vor, und sie zeigt Methoden, welche die Grenze des neuesten Stands der Technik überschreiten und eine höhere Leistung von Bildverarbeitungssystemen ermöglichen
Perspective motion segmentation via collaborative clustering
10.1109/ICCV.2013.173Proceedings of the IEEE International Conference on Computer Vision1369-1376PICV