47 research outputs found

    An adaptive training-less framework for anomaly detection in crowd scenes

    Get PDF
    Anomaly detection in crowd videos has become a popular area of research for the computer vision community. Several existing methods have determined anomaly as a deviation from scene normalcy learned via separate training with/without labeled information. However, owing to rare and sparse nature of anomalous events, any such learning can be misleading as there exist no hardcore segregation between anomalous and non-anomalous events. To address such challenge, we propose an adaptive training-less system capable of detecting anomaly on-the-fly. Our solution pipeline consists of three major components, namely, adaptive 3D-DCT model for multi-object detection-based association, local motion descriptor generation through an improved saliency guided optical flow, and anomaly detection based on Earth mover's distance (EMD). The proposed model, despite being training-free, is found to achieve comparable performance with several state-of-the-art methods on publicly available UCSD, UMN, CUHK-Avenue and ShanghaiTech datasets.</p

    Graph-based Visual Saliency Model using Background Color

    Get PDF
    Visual saliency is a cognitive psychology concept that makes some stimuli of a scene stand out relative to their neighbors and attract our attention. Computing visual saliency is a topic of recent interest. Here, we propose a graph-based method for saliency detection, which contains three stages: pre-processing, initial saliency detection and final saliency detection. The initial saliency map is obtained by putting adaptive threshold on color differences relative to the background. In final saliency detection, a graph is constructed, and the ranking technique is exploited. In the proposed method, the background is suppressed effectively, and often salient regions are selected correctly. Experimental results on the MSRA-1000 database demonstrate excellent performance and low computational complexity in comparison with the state-of-the-art methods

    Approximate nearest neighbors for recognition of foreground and background in images and video

    Get PDF
    Problems in image matching, saliency detection in images, and background detection in video are studied. Algorithms based on approximate nearest-neighbor matching are proposed to solve problems in these related domains. Image patches are quantized into features using a special Walsh-Hadamard transform, and put into a propagation-assisted kd-tree for indexing and search. Image saliency and background-detection algorithms are then derived by looking at patch similarity over time and space

    Unsupervised Learning from Shollow to Deep

    Get PDF
    Machine learning plays a pivotal role in most state-of-the-art systems in many application research domains. With the rising of deep learning, massive labeled data become the solution of feature learning, which enables the model to learn automatically. Unfortunately, the trained deep learning model is hard to adapt to other datasets without fine-tuning, and the applicability of machine learning methods is limited by the amount of available labeled data. Therefore, the aim of this thesis is to alleviate the limitations of supervised learning by exploring algorithms to learn good internal representations, and invariant feature hierarchies from unlabelled data. Firstly, we extend the traditional dictionary learning and sparse coding algorithms onto hierarchical image representations in a principled way. To achieve dictionary atoms capture additional information from extended receptive fields and attain improved descriptive capacity, we present a two-pass multi-resolution cascade framework for dictionary learning and sparse coding. This cascade method allows collaborative reconstructions at different resolutions using only the same dimensional dictionary atoms. The jointly learned dictionary comprises atoms that adapt to the information available at the coarsest layer, where the support of atoms reaches a maximum range, and the residual images, where the supplementary details refine progressively a reconstruction objective. Our method generates flexible and accurate representations using only a small number of coefficients, and is efficient in computation. In the following work, we propose to incorporate the traditional self-expressiveness property into deep learning to explore better representation for subspace clustering. This architecture is built upon deep auto-encoders, which non-linearly map the input data into a latent space. Our key idea is to introduce a novel self-expressive layer between the encoder and the decoder to mimic the ``self-expressiveness'' property that has proven effective in traditional subspace clustering. Being differentiable, our new self-expressive layer provides a simple but effective way to learn pairwise affinities between all data points through a standard back-propagation procedure. Being nonlinear, our neural-network based method is able to cluster data points having complex (often nonlinear) structures. However, Subspace clustering algorithms are notorious for their scalability issues because building and processing large affinity matrices are demanding. We propose two methods to tackle this problem. One method is based on kk-Subspace Clustering, where we introduce a method that simultaneously learns an embedding space along subspaces within it to minimize a notion of reconstruction error, thus addressing the problem of subspace clustering in an end-to-end learning paradigm. This in turn frees us from the need of having an affinity matrix to perform clustering. The other way starts from using a feed forward network to replace the spectral clustering and learn the affinities of each data from "self-expressive" layer. We introduce the Neural Collaborative Subspace Clustering, where it benefits from a classifier which determines whether a pair of points lies on the same subspace under supervision of "self-expressive" layer. Essential to our model is the construction of two affinity matrices, one from the classifier and the other from a notion of subspace self-expressiveness, to supervise training in a collaborative scheme. In summary, we make constributions on how to perform the unsupervised learning in several tasks in this thesis. It starts from traditional sparse coding and dictionary learning perspective in low-level vision. Then, we exploit how to incorporate unsupervised learning in convolutional neural networks without label information and make subspace clustering to large scale dataset. Furthermore, we also extend the clustering on dense prediction task (saliency detection)

    Graph Spectral Image Processing

    Full text link
    Recent advent of graph signal processing (GSP) has spurred intensive studies of signals that live naturally on irregular data kernels described by graphs (e.g., social networks, wireless sensor networks). Though a digital image contains pixels that reside on a regularly sampled 2D grid, if one can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for processing and analysis of the signal in graph spectral domain. In this article, we overview recent graph spectral techniques in GSP specifically for image / video processing. The topics covered include image compression, image restoration, image filtering and image segmentation

    Fusion of block and keypoints based approaches for effective copy-move image forgery detection

    Get PDF
    Keypoint-based and block-based methods are two main categories of techniques for detecting copy-move forged images, one of the most common digital image forgery schemes. In general, block-based methods suffer from high computational cost due to the large number of image blocks used and fail to handle geometric transformations. On the contrary, keypoint-based approaches can overcome these two drawbacks yet are found difficult to deal with smooth regions. As a result, fusion of these two approaches is proposed for effective copy-move forgery detection. First, our scheme adaptively determines an appropriate initial size of regions to segment the image into non-overlapped regions. Feature points are extracted as keypoints using the scale invariant feature transform (SIFT) from the image. The ratio between the number of keypoints and the total number of pixels in that region is used to classify the region into smooth or non-smooth (keypoints) regions. Accordingly, block based approach using Zernike moments and keypoint based approach using SIFT along with filtering and post-processing are respectively applied to these two kinds of regions for effective forgery detection. Experimental results show that the proposed fusion scheme outperforms the keypoint-based method in reliability of detection and the block-based method in efficiency

    Quality-Driven video analysis for the improvement of foreground segmentation

    Full text link
    Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones.Fecha de lectura: 15-06-2018It was partially supported by the Spanish Government (TEC2014-53176-R, HAVideo
    corecore