467 research outputs found
Robust and Optimal Methods for Geometric Sensor Data Alignment
Geometric sensor data alignment - the problem of finding the
rigid transformation that correctly aligns two sets of sensor
data without prior knowledge of how the data correspond - is a
fundamental task in computer vision and robotics. It is
inconvenient then that outliers and non-convexity are inherent to
the problem and present significant challenges for alignment
algorithms. Outliers are highly prevalent in sets of sensor data,
particularly when the sets overlap incompletely. Despite this,
many alignment objective functions are not robust to outliers,
leading to erroneous alignments. In addition, alignment problems
are highly non-convex, a property arising from the objective
function and the transformation. While finding a local optimum
may not be difficult, finding the global optimum is a hard
optimisation problem. These key challenges have not been fully
and jointly resolved in the existing literature, and so there is
a need for robust and optimal solutions to alignment problems.
Hence the objective of this thesis is to develop tractable
algorithms for geometric sensor data alignment that are robust to
outliers and not susceptible to spurious local optima.
This thesis makes several significant contributions to the
geometric alignment literature, founded on new insights into
robust alignment and the geometry of transformations. Firstly, a
novel discriminative sensor data representation is proposed that
has better viewpoint invariance than generative models and is
time and memory efficient without sacrificing model fidelity.
Secondly, a novel local optimisation algorithm is developed for
nD-nD geometric alignment under a robust distance measure. It
manifests a wider region of convergence and a greater robustness
to outliers and sampling artefacts than other local optimisation
algorithms. Thirdly, the first optimal solution for 3D-3D
geometric alignment with an inherently robust objective function
is proposed. It outperforms other geometric alignment algorithms
on challenging datasets due to its guaranteed optimality and
outlier robustness, and has an efficient parallel implementation.
Fourthly, the first optimal solution for 2D-3D geometric
alignment with an inherently robust objective function is
proposed. It outperforms existing approaches on challenging
datasets, reliably finding the global optimum, and has an
efficient parallel implementation. Finally, another optimal
solution is developed for 2D-3D geometric alignment, using a
robust surface alignment measure.
Ultimately, robust and optimal methods, such as those in this
thesis, are necessary to reliably find accurate solutions to
geometric sensor data alignment problems
최대 가중 클릭 문제의 동적 생성법을 이용한 온라인 다중 카메라 다중 물체 추적 기법
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 8. 최진영.In this dissertation, we propose an online and real-time algorithm for tracking of multiple targets with multiple cameras that have overlapping field of views. Because of its applicability, multiple target tracking with a visual sensor has been studied intensively during the recent decades. Especially, algorithms using multiple overlapping cameras have been proposed to overcome the occlusion and missing problem of target that cannot be resolved by a single camera. Since the multiple camera multiple target tracking (MCMTT) problem is more complicated than the single camera multiple target tracking (SCMTT) problem, most of MCMTT algorithms are based on a batch process which considers a whole sequence at a time. Although the batch-based algorithms have been achieved the robust performance, their usability is limited because many practical applications need an instantaneous result. The objective of this dissertation is to develop an online MCMTT algorithm that has compatible tracking performance compared to the batch-based algorithms, but requires a small amount of computations.
The proposed algorithm generates track hypotheses (or simply called `track') with all possible data associations between object detections from multiple cameras through frames. Then, it picks a set of tracks that best describes the tracking of targets. To identify a good track, the quality of each track is measured by our score function. The tracking solution is, then, a set of tracks that has the maximum total score. To get the solution, we formulate the problem of finding those track set as the maximum weighted clique problem (MWCP), which is one of the widely adopted formulations for a combinatorial problem that has the pairwise compatibility relationship among the variables. MWCP is well-known NP-complete problem and its worst-case computation time is proportional to the exponent of the number of tracks. Thus, solving MWCP is intractable because the number of candidate tracks exponentially increases when the tracking progresses. To alleviate the huge computational load, we propose an online scheme that dynamically formates multiple MWCPs with small-sized subsets of candidate tracks in every frame. The scheme is motivated by that the tracking solutions from consecutive frames are very similar because the status of each target is not abruptly changed between one frame. When we assume that a specific track set is an actual solution of the previous frame, only a small number of tracks have a possibility to become a solution track of the current frame. Thus, we can narrow down the size of candidate track set with the previous solution. However, propagating only the best solution of each frame can cause irreducible error when a wrong track set is chosen as the solution because of the tracking ambiguity. To hedge the risk of this error, we find multiple good solutions at each frame and propagate the K-best solutions among them to the next frame instead of a single solution. When the candidate tracks are updated and generated with newly obtained detections at the next frame, we generate multiple subsets of the entire candidate tracks with the K-best previous solutions. Each subset consists of candidate solution tracks with respect to each of the previous solutions, and a small-sized MWCP is formated with the subset. Then, our algorithm finds multiple solutions from each MWCP and repeats above procedures until the tracking is terminated. Even the proposed algorithm solves multiple MWCPs, it has lower computational complexity than solving a single MWCP with the entire candidate tracks because the overall computational load is mainly affected by the size of the largest MWCP. Moreover, when an instantaneous result is demanded, our algorithm finds better solution than solving a single large-sized MWCP because it finds more diverse solutions under a limited solving time.
Although our dynamic formulation remarkably moderates the overall computational complexity, it is still challenging to satisfy the real-time capability of the tracking system. Thus, we apply three more strategies to reduce the computation time. First, we generate tracklets, robust fragments of a target's trajectory, at each camera and generate candidate tracks with those tracklets instead of detections. This prevents a generation of many absurd tracks. Second, we adopt a heuristic algorithm called a breakout local search (BLS) to solve each MWCP. With BLS, multiple suboptimal solutions can be found efficiently within a short time. Last, we prune the candidate tracks with a probability that is calculated with the K-best solutions. The probability represents the quality of each track with respect to the overall tracking situation instead of an individual track. Thus, utilizing this probability ensures a proper pruning of candidate tracks.
In the experiments with a public benchmark dataset, our algorithm shows the compatible performance compared to the state-of-the-art batch-based MCMTT algorithms. Moreover, our algorithm shows a real-time capability by achieving a satisfactory performance within a reasonable computation time. We also conduct a self-comparison to verify our dynamic MWCP formation with respect to the tracking performance and solving time. When a sufficient number of solutions are propagated, our algorithm performs better and takes shorter time than solving a single MWCP considering the entire candidate tracks.Chapter 1 Introduction 1
1.1 Background 1
1.2 Related Works 3
1.2.1 Reconstruction-and-tracking methods 4
1.2.2 Tracking-and-reconstruction methods 6
1.2.3 Unified frameworks 7
1.3 Contents of the Research 8
1.4 Thesis Organization 11
Chapter 2 Preliminaries 13
2.1 Bayesian Tracking 14
2.1.1 Recursive Bayesian Tracking 16
2.1.2 Bayesian Tracking for Multiple Targets 17
2.1.3 Multiple Hypothesis Tracking (MHT) 19
2.2 Maximum Weighted Clique Problem (MWCP) 24
2.2.1 Clique Problems 24
2.2.2 Solving MWCP 26
2.3 Breakout Local Search (BLS) 27
2.3.1 Solution exploration 28
2.3.2 Perturbation Strategies 30
2.3.3 Initial Solution and Termination Condition 32
Chapter 3 Proposed Approach 35
3.1 Problem Statements 35
3.2 Tracklet Generation 40
3.2.1 Detection-to-tracklet Matching 43
3.2.2 Matching Score with Motion Estimation 46
3.2.3 Matching Validation 49
3.3 Track Hypothesis 51
3.3.1 Tracklet Association 51
3.3.2 Online Generation of Association Sets 55
3.3.3 Track Generation 57
3.3.4 Track Score 59
3.4 Global Hypothesis 64
3.4.1 MWCP for MCMTT 65
3.4.2 BLS for MCMTT 69
3.5 Pruning 70
3.5.1 Approximated Global Track Probability 71
3.5.2 Track Pruning Scheme 72
Chapter 4 Experiments 75
4.1 Comparison with the State-of-the-art Methods 81
4.2 Influence of Parameters 84
4.3 Score Function Analysis 87
4.4 Solving Scheme Analysis 88
4.5 Qualitative Results 90
Chapter 5 Concluding Remarks 97
5.1 Conclusions 97
5.2 Future Works 98
초록 117Docto
Energy Minimization for Multiple Object Tracking
Multiple target tracking aims at reconstructing trajectories of several
moving targets in a dynamic scene, and is of significant relevance for a
large number of applications. For example, predicting a pedestrian’s
action may be employed to warn an inattentive driver and reduce road
accidents; understanding a dynamic environment will facilitate
autonomous robot navigation; and analyzing crowded scenes can prevent
fatalities in mass panics.
The task of multiple target tracking is challenging for various reasons:
First of all, visual data is often ambiguous. For example, the objects
to be tracked can remain undetected due to low contrast and occlusion.
At the same time, background clutter can cause spurious measurements
that distract the tracking algorithm. A second challenge arises when
multiple measurements appear close to one another. Resolving
correspondence ambiguities leads to a combinatorial problem that quickly
becomes more complex with every time step. Moreover, a realistic model
of multi-target tracking should take physical constraints into account.
This is not only important at the level of individual targets but also
regarding interactions between them, which adds to the complexity of the
problem.
In this work the challenges described above are addressed by means of
energy minimization. Given a set of object detections, an energy
function describing the problem at hand is minimized with the goal of
finding a plausible solution for a batch of consecutive frames. Such
offline tracking-by-detection approaches have substantially advanced the
performance of multi-target tracking. Building on these ideas, this
dissertation introduces three novel techniques for multi-target tracking
that extend the state of the art as follows: The first approach
formulates the energy in discrete space, building on the work of Berclaz
et al. (2009). All possible target locations are reduced to a regular
lattice and tracking is posed as an integer linear program (ILP),
enabling (near) global optimality. Unlike prior work, however, the
proposed formulation includes a dynamic model and additional constraints
that enable performing non-maxima suppression (NMS) at the level of
trajectories. These contributions improve the performance both
qualitatively and quantitatively with respect to annotated ground truth.
The second technical contribution is a continuous energy function for
multiple target tracking that overcomes the limitations imposed by
spatial discretization. The continuous formulation is able to capture
important aspects of the problem, such as target localization or motion
estimation, more accurately. More precisely, the data term as well as
all phenomena including mutual exclusion and occlusion, appearance,
dynamics and target persistence are modeled by continuous differentiable
functions. The resulting non-convex optimization problem is minimized
locally by standard conjugate gradient descent in combination with
custom discontinuous jumps. The more accurate representation of the
problem leads to a powerful and robust multi-target tracking approach,
which shows encouraging results on particularly challenging video
sequences.
Both previous methods concentrate on reconstructing trajectories, while
disregarding the target-to-measurement assignment problem. To unify both
data association and trajectory estimation into a single optimization
framework, a discrete-continuous energy is presented in Part III of this
dissertation. Leveraging recent advances in discrete optimization
(Delong et al., 2012), it is possible to formulate multi-target tracking
as a model-fitting approach, where discrete assignments and continuous
trajectory representations are combined into a single objective
function. To enable efficient optimization, the energy is minimized
locally by alternating between the discrete and the continuous set of
variables.
The final contribution of this dissertation is an extensive discussion
on performance evaluation and comparison of tracking algorithms, which
points out important practical issues that ought not be ignored
A Matter of Perspective - Three-dimensional Placement of Multiple Cameras to Maximize their Coverage
Combinatorial Solutions for Shape Optimization in Computer Vision
This thesis aims at solving so-called shape optimization problems, i.e. problems where the shape of some real-world entity is sought, by applying combinatorial algorithms. I present several advances in this field, all of them based on energy minimization. The addressed problems will become more intricate in the course of the thesis, starting from problems that are solved globally, then turning to problems where so far no global solutions are known. The first two chapters treat segmentation problems where the considered grouping criterion is directly derived from the image data. That is, the respective data terms do not involve any parameters to estimate. These problems will be solved globally. The first of these chapters treats the problem of unsupervised image segmentation where apart from the image there is no other user input. Here I will focus on a contour-based method and show how to integrate curvature regularity into a ratio-based optimization framework. The arising optimization problem is reduced to optimizing over the cycles in a product graph. This problem can be solved globally in polynomial, effectively linear time. As a consequence, the method does not depend on initialization and translational invariance is achieved. This is joint work with Daniel Cremers and Simon Masnou. I will then proceed to the integration of shape knowledge into the framework, while keeping translational invariance. This problem is again reduced to cycle-finding in a product graph. Being based on the alignment of shape points, the method actually uses a more sophisticated shape measure than most local approaches and still provides global optima. It readily extends to tracking problems and allows to solve some of them in real-time. I will present an extension to highly deformable shape models which can be included in the global optimization framework. This method simultaneously allows to decompose a shape into a set of deformable parts, based only on the input images. This is joint work with Daniel Cremers. In the second part segmentation is combined with so-called correspondence problems, i.e. the underlying grouping criterion is now based on correspondences that have to be inferred simultaneously. That is, in addition to inferring the shapes of objects, one now also tries to put into correspondence the points in several images. The arising problems become more intricate and are no longer optimized globally. This part is divided into two chapters. The first chapter treats the topic of real-time motion segmentation where objects are identified based on the observations that the respective points in the video will move coherently. Rather than pre-estimating motion, a single energy functional is minimized via alternating optimization. The main novelty lies in the real-time capability, which is achieved by exploiting a fast combinatorial segmentation algorithm. The results are furthermore improved by employing a probabilistic data term. This is joint work with Daniel Cremers. The final chapter presents a method for high resolution motion layer decomposition and was developed in combination with Daniel Cremers and Thomas Pock. Layer decomposition methods support the notion of a scene model, which allows to model occlusion and enforce temporal consistency. The contributions are twofold: from a practical point of view the proposed method allows to recover fine-detailed layer images by minimizing a single energy. This is achieved by integrating a super-resolution method into the layer decomposition framework. From a theoretical viewpoint the proposed method introduces layer-based regularity terms as well as a graph cut-based scheme to solve for the layer domains. The latter is combined with powerful continuous convex optimization techniques into an alternating minimization scheme. Lastly I want to mention that a significant part of this thesis is devoted to the recent trend of exploiting parallel architectures, in particular graphics cards: many combinatorial algorithms are easily parallelized. In Chapter 3 we will see a case where the standard algorithm is hard to parallelize, but easy for the respective problem instances
Applications of a Graph Theoretic Based Clustering Framework in Computer Vision and Pattern Recognition
Recently, several clustering algorithms have been used to solve variety of
problems from different discipline. This dissertation aims to address different
challenging tasks in computer vision and pattern recognition by casting the
problems as a clustering problem. We proposed novel approaches to solve
multi-target tracking, visual geo-localization and outlier detection problems
using a unified underlining clustering framework, i.e., dominant set clustering
and its extensions, and presented a superior result over several
state-of-the-art approaches.Comment: doctoral dissertatio
3D Sensor Placement and Embedded Processing for People Detection in an Industrial Environment
Papers I, II and III are extracted from the dissertation and uploaded as separate documents to meet post-publication requirements for self-arciving of IEEE conference papers.At a time when autonomy is being introduced in more and more areas, computer vision plays a very important role. In an industrial environment, the ability to create a real-time virtual version of a volume of interest provides a broad range of possibilities, including safety-related systems such as vision based anti-collision and personnel tracking. In an offshore environment, where such systems are not common, the task is challenging due to rough weather and environmental conditions, but the result of introducing such safety systems could potentially be lifesaving, as personnel work close to heavy, huge, and often poorly instrumented moving machinery and equipment. This thesis presents research on important topics related to enabling computer vision systems in industrial and offshore environments, including a review of the most important technologies and methods. A prototype 3D sensor package is developed, consisting of different sensors and a powerful embedded computer. This, together with a novel, highly scalable point cloud compression and sensor fusion scheme allows to create a real-time 3D map of an industrial area. The question of where to place the sensor packages in an environment where occlusions are present is also investigated. The result is algorithms for automatic sensor placement optimisation, where the goal is to place sensors in such a way that maximises the volume of interest that is covered, with as few occluded zones as possible. The method also includes redundancy constraints where important sub-volumes can be defined to be viewed by more than one sensor. Lastly, a people detection scheme using a merged point cloud from six different sensor packages as input is developed. Using a combination of point cloud clustering, flattening and convolutional neural networks, the system successfully detects multiple people in an outdoor industrial environment, providing real-time 3D positions. The sensor packages and methods are tested and verified at the Industrial Robotics Lab at the University of Agder, and the people detection method is also tested in a relevant outdoor, industrial testing facility. The experiments and results are presented in the papers attached to this thesis.publishedVersio
- …