467 research outputs found

    Robust and Optimal Methods for Geometric Sensor Data Alignment

    Get PDF
    Geometric sensor data alignment - the problem of finding the rigid transformation that correctly aligns two sets of sensor data without prior knowledge of how the data correspond - is a fundamental task in computer vision and robotics. It is inconvenient then that outliers and non-convexity are inherent to the problem and present significant challenges for alignment algorithms. Outliers are highly prevalent in sets of sensor data, particularly when the sets overlap incompletely. Despite this, many alignment objective functions are not robust to outliers, leading to erroneous alignments. In addition, alignment problems are highly non-convex, a property arising from the objective function and the transformation. While finding a local optimum may not be difficult, finding the global optimum is a hard optimisation problem. These key challenges have not been fully and jointly resolved in the existing literature, and so there is a need for robust and optimal solutions to alignment problems. Hence the objective of this thesis is to develop tractable algorithms for geometric sensor data alignment that are robust to outliers and not susceptible to spurious local optima. This thesis makes several significant contributions to the geometric alignment literature, founded on new insights into robust alignment and the geometry of transformations. Firstly, a novel discriminative sensor data representation is proposed that has better viewpoint invariance than generative models and is time and memory efficient without sacrificing model fidelity. Secondly, a novel local optimisation algorithm is developed for nD-nD geometric alignment under a robust distance measure. It manifests a wider region of convergence and a greater robustness to outliers and sampling artefacts than other local optimisation algorithms. Thirdly, the first optimal solution for 3D-3D geometric alignment with an inherently robust objective function is proposed. It outperforms other geometric alignment algorithms on challenging datasets due to its guaranteed optimality and outlier robustness, and has an efficient parallel implementation. Fourthly, the first optimal solution for 2D-3D geometric alignment with an inherently robust objective function is proposed. It outperforms existing approaches on challenging datasets, reliably finding the global optimum, and has an efficient parallel implementation. Finally, another optimal solution is developed for 2D-3D geometric alignment, using a robust surface alignment measure. Ultimately, robust and optimal methods, such as those in this thesis, are necessary to reliably find accurate solutions to geometric sensor data alignment problems

    최대 가중 클릭 문제의 동적 생성법을 이용한 온라인 다중 카메라 다중 물체 추적 기법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 8. 최진영.In this dissertation, we propose an online and real-time algorithm for tracking of multiple targets with multiple cameras that have overlapping field of views. Because of its applicability, multiple target tracking with a visual sensor has been studied intensively during the recent decades. Especially, algorithms using multiple overlapping cameras have been proposed to overcome the occlusion and missing problem of target that cannot be resolved by a single camera. Since the multiple camera multiple target tracking (MCMTT) problem is more complicated than the single camera multiple target tracking (SCMTT) problem, most of MCMTT algorithms are based on a batch process which considers a whole sequence at a time. Although the batch-based algorithms have been achieved the robust performance, their usability is limited because many practical applications need an instantaneous result. The objective of this dissertation is to develop an online MCMTT algorithm that has compatible tracking performance compared to the batch-based algorithms, but requires a small amount of computations. The proposed algorithm generates track hypotheses (or simply called `track') with all possible data associations between object detections from multiple cameras through frames. Then, it picks a set of tracks that best describes the tracking of targets. To identify a good track, the quality of each track is measured by our score function. The tracking solution is, then, a set of tracks that has the maximum total score. To get the solution, we formulate the problem of finding those track set as the maximum weighted clique problem (MWCP), which is one of the widely adopted formulations for a combinatorial problem that has the pairwise compatibility relationship among the variables. MWCP is well-known NP-complete problem and its worst-case computation time is proportional to the exponent of the number of tracks. Thus, solving MWCP is intractable because the number of candidate tracks exponentially increases when the tracking progresses. To alleviate the huge computational load, we propose an online scheme that dynamically formates multiple MWCPs with small-sized subsets of candidate tracks in every frame. The scheme is motivated by that the tracking solutions from consecutive frames are very similar because the status of each target is not abruptly changed between one frame. When we assume that a specific track set is an actual solution of the previous frame, only a small number of tracks have a possibility to become a solution track of the current frame. Thus, we can narrow down the size of candidate track set with the previous solution. However, propagating only the best solution of each frame can cause irreducible error when a wrong track set is chosen as the solution because of the tracking ambiguity. To hedge the risk of this error, we find multiple good solutions at each frame and propagate the K-best solutions among them to the next frame instead of a single solution. When the candidate tracks are updated and generated with newly obtained detections at the next frame, we generate multiple subsets of the entire candidate tracks with the K-best previous solutions. Each subset consists of candidate solution tracks with respect to each of the previous solutions, and a small-sized MWCP is formated with the subset. Then, our algorithm finds multiple solutions from each MWCP and repeats above procedures until the tracking is terminated. Even the proposed algorithm solves multiple MWCPs, it has lower computational complexity than solving a single MWCP with the entire candidate tracks because the overall computational load is mainly affected by the size of the largest MWCP. Moreover, when an instantaneous result is demanded, our algorithm finds better solution than solving a single large-sized MWCP because it finds more diverse solutions under a limited solving time. Although our dynamic formulation remarkably moderates the overall computational complexity, it is still challenging to satisfy the real-time capability of the tracking system. Thus, we apply three more strategies to reduce the computation time. First, we generate tracklets, robust fragments of a target's trajectory, at each camera and generate candidate tracks with those tracklets instead of detections. This prevents a generation of many absurd tracks. Second, we adopt a heuristic algorithm called a breakout local search (BLS) to solve each MWCP. With BLS, multiple suboptimal solutions can be found efficiently within a short time. Last, we prune the candidate tracks with a probability that is calculated with the K-best solutions. The probability represents the quality of each track with respect to the overall tracking situation instead of an individual track. Thus, utilizing this probability ensures a proper pruning of candidate tracks. In the experiments with a public benchmark dataset, our algorithm shows the compatible performance compared to the state-of-the-art batch-based MCMTT algorithms. Moreover, our algorithm shows a real-time capability by achieving a satisfactory performance within a reasonable computation time. We also conduct a self-comparison to verify our dynamic MWCP formation with respect to the tracking performance and solving time. When a sufficient number of solutions are propagated, our algorithm performs better and takes shorter time than solving a single MWCP considering the entire candidate tracks.Chapter 1 Introduction 1 1.1 Background 1 1.2 Related Works 3 1.2.1 Reconstruction-and-tracking methods 4 1.2.2 Tracking-and-reconstruction methods 6 1.2.3 Unified frameworks 7 1.3 Contents of the Research 8 1.4 Thesis Organization 11 Chapter 2 Preliminaries 13 2.1 Bayesian Tracking 14 2.1.1 Recursive Bayesian Tracking 16 2.1.2 Bayesian Tracking for Multiple Targets 17 2.1.3 Multiple Hypothesis Tracking (MHT) 19 2.2 Maximum Weighted Clique Problem (MWCP) 24 2.2.1 Clique Problems 24 2.2.2 Solving MWCP 26 2.3 Breakout Local Search (BLS) 27 2.3.1 Solution exploration 28 2.3.2 Perturbation Strategies 30 2.3.3 Initial Solution and Termination Condition 32 Chapter 3 Proposed Approach 35 3.1 Problem Statements 35 3.2 Tracklet Generation 40 3.2.1 Detection-to-tracklet Matching 43 3.2.2 Matching Score with Motion Estimation 46 3.2.3 Matching Validation 49 3.3 Track Hypothesis 51 3.3.1 Tracklet Association 51 3.3.2 Online Generation of Association Sets 55 3.3.3 Track Generation 57 3.3.4 Track Score 59 3.4 Global Hypothesis 64 3.4.1 MWCP for MCMTT 65 3.4.2 BLS for MCMTT 69 3.5 Pruning 70 3.5.1 Approximated Global Track Probability 71 3.5.2 Track Pruning Scheme 72 Chapter 4 Experiments 75 4.1 Comparison with the State-of-the-art Methods 81 4.2 Influence of Parameters 84 4.3 Score Function Analysis 87 4.4 Solving Scheme Analysis 88 4.5 Qualitative Results 90 Chapter 5 Concluding Remarks 97 5.1 Conclusions 97 5.2 Future Works 98 초록 117Docto

    Energy Minimization for Multiple Object Tracking

    Get PDF
    Multiple target tracking aims at reconstructing trajectories of several moving targets in a dynamic scene, and is of significant relevance for a large number of applications. For example, predicting a pedestrian’s action may be employed to warn an inattentive driver and reduce road accidents; understanding a dynamic environment will facilitate autonomous robot navigation; and analyzing crowded scenes can prevent fatalities in mass panics. The task of multiple target tracking is challenging for various reasons: First of all, visual data is often ambiguous. For example, the objects to be tracked can remain undetected due to low contrast and occlusion. At the same time, background clutter can cause spurious measurements that distract the tracking algorithm. A second challenge arises when multiple measurements appear close to one another. Resolving correspondence ambiguities leads to a combinatorial problem that quickly becomes more complex with every time step. Moreover, a realistic model of multi-target tracking should take physical constraints into account. This is not only important at the level of individual targets but also regarding interactions between them, which adds to the complexity of the problem. In this work the challenges described above are addressed by means of energy minimization. Given a set of object detections, an energy function describing the problem at hand is minimized with the goal of finding a plausible solution for a batch of consecutive frames. Such offline tracking-by-detection approaches have substantially advanced the performance of multi-target tracking. Building on these ideas, this dissertation introduces three novel techniques for multi-target tracking that extend the state of the art as follows: The first approach formulates the energy in discrete space, building on the work of Berclaz et al. (2009). All possible target locations are reduced to a regular lattice and tracking is posed as an integer linear program (ILP), enabling (near) global optimality. Unlike prior work, however, the proposed formulation includes a dynamic model and additional constraints that enable performing non-maxima suppression (NMS) at the level of trajectories. These contributions improve the performance both qualitatively and quantitatively with respect to annotated ground truth. The second technical contribution is a continuous energy function for multiple target tracking that overcomes the limitations imposed by spatial discretization. The continuous formulation is able to capture important aspects of the problem, such as target localization or motion estimation, more accurately. More precisely, the data term as well as all phenomena including mutual exclusion and occlusion, appearance, dynamics and target persistence are modeled by continuous differentiable functions. The resulting non-convex optimization problem is minimized locally by standard conjugate gradient descent in combination with custom discontinuous jumps. The more accurate representation of the problem leads to a powerful and robust multi-target tracking approach, which shows encouraging results on particularly challenging video sequences. Both previous methods concentrate on reconstructing trajectories, while disregarding the target-to-measurement assignment problem. To unify both data association and trajectory estimation into a single optimization framework, a discrete-continuous energy is presented in Part III of this dissertation. Leveraging recent advances in discrete optimization (Delong et al., 2012), it is possible to formulate multi-target tracking as a model-fitting approach, where discrete assignments and continuous trajectory representations are combined into a single objective function. To enable efficient optimization, the energy is minimized locally by alternating between the discrete and the continuous set of variables. The final contribution of this dissertation is an extensive discussion on performance evaluation and comparison of tracking algorithms, which points out important practical issues that ought not be ignored

    You'll never walk alone: Modeling social behavior for multi-target tracking

    Full text link

    Combinatorial Solutions for Shape Optimization in Computer Vision

    Get PDF
    This thesis aims at solving so-called shape optimization problems, i.e. problems where the shape of some real-world entity is sought, by applying combinatorial algorithms. I present several advances in this field, all of them based on energy minimization. The addressed problems will become more intricate in the course of the thesis, starting from problems that are solved globally, then turning to problems where so far no global solutions are known. The first two chapters treat segmentation problems where the considered grouping criterion is directly derived from the image data. That is, the respective data terms do not involve any parameters to estimate. These problems will be solved globally. The first of these chapters treats the problem of unsupervised image segmentation where apart from the image there is no other user input. Here I will focus on a contour-based method and show how to integrate curvature regularity into a ratio-based optimization framework. The arising optimization problem is reduced to optimizing over the cycles in a product graph. This problem can be solved globally in polynomial, effectively linear time. As a consequence, the method does not depend on initialization and translational invariance is achieved. This is joint work with Daniel Cremers and Simon Masnou. I will then proceed to the integration of shape knowledge into the framework, while keeping translational invariance. This problem is again reduced to cycle-finding in a product graph. Being based on the alignment of shape points, the method actually uses a more sophisticated shape measure than most local approaches and still provides global optima. It readily extends to tracking problems and allows to solve some of them in real-time. I will present an extension to highly deformable shape models which can be included in the global optimization framework. This method simultaneously allows to decompose a shape into a set of deformable parts, based only on the input images. This is joint work with Daniel Cremers. In the second part segmentation is combined with so-called correspondence problems, i.e. the underlying grouping criterion is now based on correspondences that have to be inferred simultaneously. That is, in addition to inferring the shapes of objects, one now also tries to put into correspondence the points in several images. The arising problems become more intricate and are no longer optimized globally. This part is divided into two chapters. The first chapter treats the topic of real-time motion segmentation where objects are identified based on the observations that the respective points in the video will move coherently. Rather than pre-estimating motion, a single energy functional is minimized via alternating optimization. The main novelty lies in the real-time capability, which is achieved by exploiting a fast combinatorial segmentation algorithm. The results are furthermore improved by employing a probabilistic data term. This is joint work with Daniel Cremers. The final chapter presents a method for high resolution motion layer decomposition and was developed in combination with Daniel Cremers and Thomas Pock. Layer decomposition methods support the notion of a scene model, which allows to model occlusion and enforce temporal consistency. The contributions are twofold: from a practical point of view the proposed method allows to recover fine-detailed layer images by minimizing a single energy. This is achieved by integrating a super-resolution method into the layer decomposition framework. From a theoretical viewpoint the proposed method introduces layer-based regularity terms as well as a graph cut-based scheme to solve for the layer domains. The latter is combined with powerful continuous convex optimization techniques into an alternating minimization scheme. Lastly I want to mention that a significant part of this thesis is devoted to the recent trend of exploiting parallel architectures, in particular graphics cards: many combinatorial algorithms are easily parallelized. In Chapter 3 we will see a case where the standard algorithm is hard to parallelize, but easy for the respective problem instances

    Applications of a Graph Theoretic Based Clustering Framework in Computer Vision and Pattern Recognition

    Full text link
    Recently, several clustering algorithms have been used to solve variety of problems from different discipline. This dissertation aims to address different challenging tasks in computer vision and pattern recognition by casting the problems as a clustering problem. We proposed novel approaches to solve multi-target tracking, visual geo-localization and outlier detection problems using a unified underlining clustering framework, i.e., dominant set clustering and its extensions, and presented a superior result over several state-of-the-art approaches.Comment: doctoral dissertatio

    3D Sensor Placement and Embedded Processing for People Detection in an Industrial Environment

    Get PDF
    Papers I, II and III are extracted from the dissertation and uploaded as separate documents to meet post-publication requirements for self-arciving of IEEE conference papers.At a time when autonomy is being introduced in more and more areas, computer vision plays a very important role. In an industrial environment, the ability to create a real-time virtual version of a volume of interest provides a broad range of possibilities, including safety-related systems such as vision based anti-collision and personnel tracking. In an offshore environment, where such systems are not common, the task is challenging due to rough weather and environmental conditions, but the result of introducing such safety systems could potentially be lifesaving, as personnel work close to heavy, huge, and often poorly instrumented moving machinery and equipment. This thesis presents research on important topics related to enabling computer vision systems in industrial and offshore environments, including a review of the most important technologies and methods. A prototype 3D sensor package is developed, consisting of different sensors and a powerful embedded computer. This, together with a novel, highly scalable point cloud compression and sensor fusion scheme allows to create a real-time 3D map of an industrial area. The question of where to place the sensor packages in an environment where occlusions are present is also investigated. The result is algorithms for automatic sensor placement optimisation, where the goal is to place sensors in such a way that maximises the volume of interest that is covered, with as few occluded zones as possible. The method also includes redundancy constraints where important sub-volumes can be defined to be viewed by more than one sensor. Lastly, a people detection scheme using a merged point cloud from six different sensor packages as input is developed. Using a combination of point cloud clustering, flattening and convolutional neural networks, the system successfully detects multiple people in an outdoor industrial environment, providing real-time 3D positions. The sensor packages and methods are tested and verified at the Industrial Robotics Lab at the University of Agder, and the people detection method is also tested in a relevant outdoor, industrial testing facility. The experiments and results are presented in the papers attached to this thesis.publishedVersio
    corecore