10 research outputs found

    Bandwidth selection for kernel estimation in mixed multi-dimensional spaces

    Get PDF
    Kernel estimation techniques, such as mean shift, suffer from one major drawback: the kernel bandwidth selection. The bandwidth can be fixed for all the data set or can vary at each points. Automatic bandwidth selection becomes a real challenge in case of multidimensional heterogeneous features. This paper presents a solution to this problem. It is an extension of \cite{Comaniciu03a} which was based on the fundamental property of normal distributions regarding the bias of the normalized density gradient. The selection is done iteratively for each type of features, by looking for the stability of local bandwidth estimates across a predefined range of bandwidths. A pseudo balloon mean shift filtering and partitioning are introduced. The validity of the method is demonstrated in the context of color image segmentation based on a 5-dimensional space

    Estimation à noyau adaptatif dans des espaces multidimensionnels hétérogènes

    Get PDF
    Les méthodes d'estimation à noyau, telles que le mean shift, ont un inconvénient majeur : le choix de la taille du noyau. La sélection de cette taille devient vraiment difficile dans le cas de données multidimensionnelles et hétérogènes. Nous présentons une solution à ce problème. La taille est choisie itérativement pour chaque type de données, en cherchant parmi un ensemble de tailles prédéfinies celle qui donne localement les résultats les plus stables. La sélection itérative nécessite l'introduction d'un nouvel estimateur. La méthode est validée dans le contexte de la segmentation d'image couleur et de mouvement

    Sélection de la taille du noyau pour l'estimation à noyau dans des espaces multidimensionnels hétérogènes.

    Get PDF
    Kernel estimation techniques, such as mean shift, suffer from one major drawback: the kernel bandwidth selection. This selection becomes a real challenge in case of multidimensional heterogeneous features. This paper presents a solution to this problem. The selection is done iteratively for each type of features, by looking for the stability of local bandwidth estimates within a predefined range of bandwidths. A new estimator that permits the iterative computation is introduced. The validity of the method is demonstrated in the context of color image segmentation and motion segmentation.Les méthodes d'estimation à noyau, telles que le mean shift, ont un inconvénient majeur : le choix de la taille du noyau. La sélection de cette taille devient vraiment difficile dans le cas de données multidimensionnelles et hétérogènes. Nous présentons une solution à ce problème. La taille est choisie itérativement pour chaque type de données, en cherchant parmi un ensemble de tailles prédéfinies celle qui donne localement les résultats les plus stables. La sélection itérative nécessite l'introduction d'un nouvel estimateur. La méthode est validée dans le contexte de la segmentation d'image couleur et de mouvement

    Detecting Motion through Dynamic Refraction

    Full text link

    Discovering salient objects from videos using spatiotemporal salient region detection

    Get PDF
    Detecting salient objects from images and videos has many useful applications in computer vision. In this paper, a novel spatiotemporal salient region detection approach is proposed. The proposed approach computes spatiotemporal saliency by estimating spatial and temporal saliencies separately. The spatial saliency of an image is computed by estimating the color contrast cue and color distribution cue. The estimations of these cues exploit the patch level and region level image abstractions in a unified way. The aforementioned cues are fused to compute an initial spatial saliency map, which is further refined to emphasize saliencies of objects uniformly, and to suppress saliencies of background noises. The final spatial saliency map is computed by integrating the refined saliency map with center prior map. The temporal saliency is computed based on local and global temporal saliencies estimations using patch level optical flow abstractions. Both local and global temporal saliencies are fused to compute the temporal saliency. Finally, spatial and temporal saliencies are integrated to generate a spatiotemporal saliency map. The proposed temporal and spatiotemporal salient region detection approaches are extensively experimented on challenging salient object detection video datasets. The experimental results show that the proposed approaches achieve an improved performance than several state-of-the-art saliency detection approaches. In order to compensate different needs in respect of the speed/accuracy tradeoff, faster variants of the spatial, temporal and spatiotemporal salient region detection approaches are also presented in this paper

    Tracking moving objects in surveillance video

    Get PDF
    The thesis looks at approaches to the detection and tracking of potential objects of interest in surveillance video. The aim was to investigate and develop methods that might be suitable for eventual application through embedded software, running on a fixed-point processor, in analytics capable cameras. The work considers common approaches to object detection and representation, seeking out those that offer the necessary computational economy and the potential to be able to cope with constraints such as low frame rate due to possible limited processor time, or weak chromatic content that can occur in some typical surveillance contexts. The aim is for probabilistic tracking of objects rather than simple concatenation of frame by frame detections. This involves using recursive Bayesian estimation. The particle filter is a technique for implementing such a recursion and so it is examined in the context of both single target and combined multi-target tracking. A detailed examination of the operation of the single target tracking particle filter shows that objects can be tracked successfully using a relatively simple structured grey-scale histogram representation. It is shown that basic components of the particle filter can be simplified without loss in tracking quality. An analysis brings out the relationships between commonly used target representation distance measures and shows that in the context of the particle filter there is little to choose between them. With the correct choice of parameters, the simplest and computationally economic distance measure performs well. The work shows how to make that correct choice. Similarly, it is shown that a simple measurement likelihood function can be used in place of the more ubiquitous Gaussian. The important step of target state estimation is examined. The standard weighted mean approach is rejected, a recently proposed maximum a posteriori approach is shown to be not suitable in the context of the work, and a practical alternative is developed. Two methods are presented for tracker initialization. One of them is a simplification of an existing published method, the other is a novel approach. The aim is to detect trackable objects as they enter the scene, extract trackable features, then actively follow those features through subsequent frames. The multi-target tracking problem is then posed as one of management of multiple independent trackers

    Image-set, Temporal and Spatiotemporal Representations of Videos for Recognizing, Localizing and Quantifying Actions

    Get PDF
    This dissertation addresses the problem of learning video representations, which is defined here as transforming the video so that its essential structure is made more visible or accessible for action recognition and quantification. In the literature, a video can be represented by a set of images, by modeling motion or temporal dynamics, and by a 3D graph with pixels as nodes. This dissertation contributes in proposing a set of models to localize, track, segment, recognize and assess actions such as (1) image-set models via aggregating subset features given by regularizing normalized CNNs, (2) image-set models via inter-frame principal recovery and sparsely coding residual actions, (3) temporally local models with spatially global motion estimated by robust feature matching and local motion estimated by action detection with motion model added, (4) spatiotemporal models 3D graph and 3D CNN to model time as a space dimension, (5) supervised hashing by jointly learning embedding and quantization, respectively. State-of-the-art performances are achieved for tasks such as quantifying facial pain and human diving. Primary conclusions of this dissertation are categorized as follows: (i) Image set can capture facial actions that are about collective representation; (ii) Sparse and low-rank representations can have the expression, identity and pose cues untangled and can be learned via an image-set model and also a linear model; (iii) Norm is related with recognizability; similarity metrics and loss functions matter; (v) Combining the MIL based boosting tracker with the Particle Filter motion model induces a good trade-off between the appearance similarity and motion consistence; (iv) Segmenting object locally makes it amenable to assign shape priors; it is feasible to learn knowledge such as shape priors online from Web data with weak supervision; (v) It works locally in both space and time to represent videos as 3D graphs; 3D CNNs work effectively when inputted with temporally meaningful clips; (vi) the rich labeled images or videos help to learn better hash functions after learning binary embedded codes than the random projections. In addition, models proposed for videos can be adapted to other sequential images such as volumetric medical images which are not included in this dissertation

    Cell-Based Deformation Monitoring via 3D Point Clouds

    Get PDF
    Deformation is one of the most important phenomena in environmental science and engineering. Deformation of artificial and natural objects happens worldwide, such as structural deformation, landslide, subsidence, erosion, and rockfall. Monitoring and assessment of such deformation process is not only scientifically interesting, but also beneficial to hazard/risk control and prediction. In addition, it is also useful for regional planning and development. Deformation monitoring was driven by geodetic observations in the field of traditional geodetic surveying, based on the measurement of sparse points in a control network. Recently, with the rapid development of terrestrial LiDAR techniques, millions of points with associated three-dimensional coordinates (known as "3D point clouds") can be promptly captured in a few minutes. Compared to traditional surveying, terrestrial LiDAR offers great potential for deformation monitoring, because of various advantages such as fast data capture, high data density, and precise 3D object representation. By analysing 3D point clouds, the objective of this thesis is to provide an effective and efficient approach for deformation monitoring. Towards this goal, this thesis designs a new concept of "deformation map" for deformation representation and a novel "cell-based approach" for deformation computation. The main outcome of this thesis is a novel and rich approach that is able to automatically and incrementally compute a deformation map that enables a better understanding of structural and natural hazards with heterogeneous deformation characteristics. This work includes several dedicated contributions as follows. Hybrid Deformation Modelling. This thesis firstly provides a comprehensive investigation on the modelling requirements of various deformation phenomena. The requirements concern three main aspects, i.e., what has deformation (deformation object), which type of deformation, and how to describe deformation. Based on this detailed requirement analysis, we propose a rich and hybrid deformation model. This model is composed of meta-deformation, sub-deformation and deformation map, corresponding to deformation for a small cell, for a partial area, and for the whole object, respectively. Cell-based Deformation Computation. In order to automatically and incrementally extract heterogeneous deformation of the whole monitored object, we bring the "cell" concept into deformation monitoring. This thesis builds a cell-based deformation computing framework, which consists of three key steps: split, detect, and merge. Split is to divide the space of the object into many cells (uniform or irregular); detect is to extract the meta-deformation for individual cells by analysing the inside point clouds at two epochs; and merge is to group adjacent cells with similar deformation together and to form a consistent sub-deformation. As the final result, an informative deformation map is computed for describing the deformation for the whole object. Evaluation of Cell-based Approach. To evaluate such hybrid modelling and cell-based deformation computation, this thesis extensively studies both synthetic and real-life point cloud datasets: (1) by imitating a landslide scenario, we generate synthetic data using Matlab programming and practical settings, and compare the cell-based approach with traditional non-cell based geodetic methods; (2) by analysing two real-life cases of deformation in Switzerland, we further validate our approach and compare the results with third party sources (e.g., results provided by a surveying company, results computed by using a commercial software like 3DReshaper). Extension of Cell-based Approach. At the last stages of this thesis work, we particularly focus on providing several technical extensions to enhance this cell-based deformation monitoring approach. The main extensions include: (1) supporting dynamic cells instead of uniform cells when splitting the entire object space, (2) finding cell correspondence for the deformation scenarios that have large deformation like rockfalls, (3) movement tracking with data-driven cells which have irregular cell shape that can be automatically determined by the deformation boundary itself, (4) designing an adaptive modelling strategy that is able to accordingly select a suitable model for detecting meta-deformation of cells, and (5) computing deformation evolution for a monitored object with more than two epochs of point cloud datasets

    Detection and segmentation of moving objects in highly dynamic scenes

    Get PDF
    Detecting and segmenting moving objects in dynamic scenes is a hard but essential task in a number of applications such as surveillance. Most existing methods only give good results in the case of persistent or slowly changing background, or if both the objects and the background are rigid. In this paper, we propose a new method for direct detection and segmentation of foreground moving objects in the absence of such constraints. First, groups of pixels having similar motion and photometric features are extracted. For this first step only a sub-grid of image pixels is used to reduce computational cost and improve robustness to noise. We introduce the use of p-value to validate optical flow estimates and of automatic bandwidth selection in the mean shift clustering algorithm. In a second stage, segmentation of the object associated to a given cluster is performed in a MAP/MRF framework. Our method is able to handle moving camera and several different motions in the background. Experiments on challenging sequences show the performance of the proposed method and its utility for video analysis in complex scenes. 1
    corecore