8 research outputs found

    Parsing Occluded People by Flexible Compositions

    Get PDF
    This paper presents an approach to parsing humans when there is significant occlusion. We model humans using a graphical model which has a tree structure building on recent work [32, 6] and exploit the connectivity prior that, even in presence of occlusion, the visible nodes form a connected subtree of the graphical model. We call each connected subtree a flexible composition of object parts. This involves a novel method for learning occlusion cues. During inference we need to search over a mixture of different flexible models. By exploiting part sharing, we show that this inference can be done extremely efficiently requiring only twice as many computations as searching for the entire object (i.e., not modeling occlusion). We evaluate our model on the standard benchmarked "We Are Family" Stickmen dataset and obtain significant performance improvements over the best alternative algorithms.Comment: CVPR 15 Camera Read

    A Combinatorial Solution to Non-Rigid 3D Shape-to-Image Matching

    Get PDF
    We propose a combinatorial solution for the problem of non-rigidly matching a 3D shape to 3D image data. To this end, we model the shape as a triangular mesh and allow each triangle of this mesh to be rigidly transformed to achieve a suitable matching to the image. By penalising the distance and the relative rotation between neighbouring triangles our matching compromises between image and shape information. In this paper, we resolve two major challenges: Firstly, we address the resulting large and NP-hard combinatorial problem with a suitable graph-theoretic approach. Secondly, we propose an efficient discretisation of the unbounded 6-dimensional Lie group SE(3). To our knowledge this is the first combinatorial formulation for non-rigid 3D shape-to-image matching. In contrast to existing local (gradient descent) optimisation methods, we obtain solutions that do not require a good initialisation and that are within a bound of the optimal solution. We evaluate the proposed method on the two problems of non-rigid 3D shape-to-shape and non-rigid 3D shape-to-image registration and demonstrate that it provides promising results.Comment: 10 pages, 7 figure

    Towards human interaction analysis

    Get PDF
    Modeling and recognizing human behaviors in a visual surveillance task is receiving increasing attention from computer vision and machine learning researchers. Such a system should deal in particularly with detecting when interactions between people occur and classifying the type of interaction. In this work we study a flexible model for detecting human interactions. This has been done by detecting the people in the scene and retrieving their corresponding pose and position sequentially in each frame of the video. To achieve this goal our work relies on robust object detection algorithm which is based on discriminatively trained part based models to detect the human bodies in videos. We apply a ‘Gaussian Mixture Models based’ method for background subtraction and human segmentation. The output from the segmentation method which is labeled human body is combined with the background subtraction methods to obtain a bounding box around each person in images to improve the task of human body pose detection. To gain more precise pose detection models, we trained the algorithm on large, challenging but reliable dataset (PASCAL 2010). Our method is applied in home-made database comprising depth data from Kinect sensors. After successfully getting in every image sequence the corresponding label for each person as well as their pose and position, understanding of human motion comes naturally which is an important step towards human interaction analysis

    Two Dimensional (2D) Visual Tracking in Construction Scenarios

    Get PDF
    The tracking of construction resources (e.g. workforce and equipment) in videos, i.e., two-dimensional (2D) visual tracking, has gained significant interests in the construction industries. There exist lots of research studies that relied on 2D visual tracking methods to support the surveillance of construction productivity, safety, and project progress. However, few efforts have been put on evaluating the accuracy and robustness of these tracking methods in the construction scenarios. Meanwhile, it is noticed that state-of-art tracking methods have not shown reliable performance in tracking articulated equipment, such as excavators, backhoes, and dozers etc. The main objective of this research is to fill these knowledge gaps. First, a total of fifth (15) 2D visual tracking methods were selected here due to their excellent performances identified in the computer vision field. Then, the methods were tested with twenty (20) videos captured from multiple construction job sites at day and night. The videos contain construction resources, including but not limited to excavators, backhoes, and compactors. Also, they were characterized by the attributes, such as occlusions, scale variation, and background clutter, in order to provide a comprehensive evaluation. The tracking results were evaluated with the sequence overlap score, center error ratio, and tracking length ratio respectively. According to the quantitative comparison of tracking methods, two improvements were further conducted. One is to fuse the tracking results of individual tracking methods based on the non-maximum suppression. The other is to track the articulated equipment by proposing the idea of tracking the equipment parts respectively. The test results from this research study indicated that 1) the methods built on the local sparse representation were more effective; 2) the generative tracking strategy typically outperformed the discriminative one, when being adopted to track the equipment and workforce in the construction scenarios; 3) the fusion of the results from different tracking methods increased the tracking performance by 10% in accuracy; and 4) the part-based tracking methods improved the tracking performance in both accuracy and robustness, when being used to track the articulated equipment

    Efficient deformable template detection and localization without user initialization

    No full text
    A novel deformable template is presented which detects the boundary of an open hand in a grayscale image without initialization by the user. A dynamic programming algo-rithm enhanced by pruning techniques finds the hand contour in the image in as little as 19 seconds on a Pentium 150. The template is translation- and rotation-invariant and accomo-dates shape deformation, significant occlusion and background clutter, and the presence of multiple hands. 2 Symbols Boldface letters, e.g. x, denote vectors. P (x|y) denotes conditional probability of x given y. √ a denotes the square root of a. � denotes summation. � denotes repeated product. � denotes integration. ⊥ denotes “perpendicular to.” <,> denote less than and greater than, respectively. ∇I(x) denotes the gradient of I with respect to x. ∝ denotes “proportional to.” ≈ denotes “approximately equal to.” argmax x f(x) denotes the value of x that maximizes f(x). f ⋆ g denotes the convolution of f with g. π denotes the constant 3.14159... |x | denotes the modulus of x. ∞ denotes infinity

    Combinatorial Solutions for Shape Optimization in Computer Vision

    Get PDF
    This thesis aims at solving so-called shape optimization problems, i.e. problems where the shape of some real-world entity is sought, by applying combinatorial algorithms. I present several advances in this field, all of them based on energy minimization. The addressed problems will become more intricate in the course of the thesis, starting from problems that are solved globally, then turning to problems where so far no global solutions are known. The first two chapters treat segmentation problems where the considered grouping criterion is directly derived from the image data. That is, the respective data terms do not involve any parameters to estimate. These problems will be solved globally. The first of these chapters treats the problem of unsupervised image segmentation where apart from the image there is no other user input. Here I will focus on a contour-based method and show how to integrate curvature regularity into a ratio-based optimization framework. The arising optimization problem is reduced to optimizing over the cycles in a product graph. This problem can be solved globally in polynomial, effectively linear time. As a consequence, the method does not depend on initialization and translational invariance is achieved. This is joint work with Daniel Cremers and Simon Masnou. I will then proceed to the integration of shape knowledge into the framework, while keeping translational invariance. This problem is again reduced to cycle-finding in a product graph. Being based on the alignment of shape points, the method actually uses a more sophisticated shape measure than most local approaches and still provides global optima. It readily extends to tracking problems and allows to solve some of them in real-time. I will present an extension to highly deformable shape models which can be included in the global optimization framework. This method simultaneously allows to decompose a shape into a set of deformable parts, based only on the input images. This is joint work with Daniel Cremers. In the second part segmentation is combined with so-called correspondence problems, i.e. the underlying grouping criterion is now based on correspondences that have to be inferred simultaneously. That is, in addition to inferring the shapes of objects, one now also tries to put into correspondence the points in several images. The arising problems become more intricate and are no longer optimized globally. This part is divided into two chapters. The first chapter treats the topic of real-time motion segmentation where objects are identified based on the observations that the respective points in the video will move coherently. Rather than pre-estimating motion, a single energy functional is minimized via alternating optimization. The main novelty lies in the real-time capability, which is achieved by exploiting a fast combinatorial segmentation algorithm. The results are furthermore improved by employing a probabilistic data term. This is joint work with Daniel Cremers. The final chapter presents a method for high resolution motion layer decomposition and was developed in combination with Daniel Cremers and Thomas Pock. Layer decomposition methods support the notion of a scene model, which allows to model occlusion and enforce temporal consistency. The contributions are twofold: from a practical point of view the proposed method allows to recover fine-detailed layer images by minimizing a single energy. This is achieved by integrating a super-resolution method into the layer decomposition framework. From a theoretical viewpoint the proposed method introduces layer-based regularity terms as well as a graph cut-based scheme to solve for the layer domains. The latter is combined with powerful continuous convex optimization techniques into an alternating minimization scheme. Lastly I want to mention that a significant part of this thesis is devoted to the recent trend of exploiting parallel architectures, in particular graphics cards: many combinatorial algorithms are easily parallelized. In Chapter 3 we will see a case where the standard algorithm is hard to parallelize, but easy for the respective problem instances
    corecore