783 research outputs found

    Sub-sampled dictionaries for coarse-to-fine sparse representation-based human action recognition

    Get PDF
    Automatic human action recognition is a core functionality of systems for video surveillance and human-object interaction. However, the diverse nature of human actions and the noisy nature of most video content make it difficult to achieve effective human action recognition. To overcome the aforementioned problems, Sparse Representation (SR) has recently attracted substantial research attention. However, although SR-based approaches have proven to be reasonably effective, the computational complexity of the testing stage prohibits their usage by applications requiring support for real-time operation and a vast number of human action classes. In this paper, we propose a novel method for human action recognition, leveraging coarse-to-fine sparse representations that have been obtained through dictionary sub-sampling. Comparative experimental results obtained for the UCF50 dataset demonstrate that the proposed method is able to achieve efficient human action recognition, at no substantial loss in recognition accuracy

    What Will I Do Next? The Intention from Motion Experiment

    Full text link
    In computer vision, video-based approaches have been widely explored for the early classification and the prediction of actions or activities. However, it remains unclear whether this modality (as compared to 3D kinematics) can still be reliable for the prediction of human intentions, defined as the overarching goal embedded in an action sequence. Since the same action can be performed with different intentions, this problem is more challenging but yet affordable as proved by quantitative cognitive studies which exploit the 3D kinematics acquired through motion capture systems. In this paper, we bridge cognitive and computer vision studies, by demonstrating the effectiveness of video-based approaches for the prediction of human intentions. Precisely, we propose Intention from Motion, a new paradigm where, without using any contextual information, we consider instantaneous grasping motor acts involving a bottle in order to forecast why the bottle itself has been reached (to pass it or to place in a box, or to pour or to drink the liquid inside). We process only the grasping onsets casting intention prediction as a classification framework. Leveraging on our multimodal acquisition (3D motion capture data and 2D optical videos), we compare the most commonly used 3D descriptors from cognitive studies with state-of-the-art video-based techniques. Since the two analyses achieve an equivalent performance, we demonstrate that computer vision tools are effective in capturing the kinematics and facing the cognitive problem of human intention prediction.Comment: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshop

    Textural Difference Enhancement based on Image Component Analysis

    Get PDF
    In this thesis, we propose a novel image enhancement method to magnify the textural differences in the images with respect to human visual characteristics. The method is intended to be a preprocessing step to improve the performance of the texture-based image segmentation algorithms. We propose to calculate the six Tamura's texture features (coarseness, contrast, directionality, line-likeness, regularity and roughness) in novel measurements. Each feature follows its original understanding of the certain texture characteristic, but is measured by some local low-level features, e.g., direction of the local edges, dynamic range of the local pixel intensities, kurtosis and skewness of the local image histogram. A discriminant texture feature selection method based on principal component analysis (PCA) is then proposed to find the most representative characteristics in describing textual differences in the image. We decompose the image into pairwise components representing the texture characteristics strongly and weakly, respectively. A set of wavelet-based soft thresholding methods are proposed as the dictionaries of morphological component analysis (MCA) to sparsely highlight the characteristics strongly and weakly from the image. The wavelet-based thresholding methods are proposed in pair, therefore each of the resulted pairwise components can exhibit one certain characteristic either strongly or weakly. We propose various wavelet-based manipulation methods to enhance the components separately. For each component representing a certain texture characteristic, a non-linear function is proposed to manipulate the wavelet coefficients of the component so that the component is enhanced with the corresponding characteristic accentuated independently while having little effect on other characteristics. Furthermore, the above three methods are combined into a uniform framework of image enhancement. Firstly, the texture characteristics differentiating different textures in the image are found. Secondly, the image is decomposed into components exhibiting these texture characteristics respectively. Thirdly, each component is manipulated to accentuate the corresponding texture characteristics exhibited there. After re-combining these manipulated components, the image is enhanced with the textural differences magnified with respect to the selected texture characteristics. The proposed textural differences enhancement method is used prior to both grayscale and colour image segmentation algorithms. The convincing results of improving the performance of different segmentation algorithms prove the potential of the proposed textural difference enhancement method

    Textural Difference Enhancement based on Image Component Analysis

    Get PDF
    In this thesis, we propose a novel image enhancement method to magnify the textural differences in the images with respect to human visual characteristics. The method is intended to be a preprocessing step to improve the performance of the texture-based image segmentation algorithms. We propose to calculate the six Tamura's texture features (coarseness, contrast, directionality, line-likeness, regularity and roughness) in novel measurements. Each feature follows its original understanding of the certain texture characteristic, but is measured by some local low-level features, e.g., direction of the local edges, dynamic range of the local pixel intensities, kurtosis and skewness of the local image histogram. A discriminant texture feature selection method based on principal component analysis (PCA) is then proposed to find the most representative characteristics in describing textual differences in the image. We decompose the image into pairwise components representing the texture characteristics strongly and weakly, respectively. A set of wavelet-based soft thresholding methods are proposed as the dictionaries of morphological component analysis (MCA) to sparsely highlight the characteristics strongly and weakly from the image. The wavelet-based thresholding methods are proposed in pair, therefore each of the resulted pairwise components can exhibit one certain characteristic either strongly or weakly. We propose various wavelet-based manipulation methods to enhance the components separately. For each component representing a certain texture characteristic, a non-linear function is proposed to manipulate the wavelet coefficients of the component so that the component is enhanced with the corresponding characteristic accentuated independently while having little effect on other characteristics. Furthermore, the above three methods are combined into a uniform framework of image enhancement. Firstly, the texture characteristics differentiating different textures in the image are found. Secondly, the image is decomposed into components exhibiting these texture characteristics respectively. Thirdly, each component is manipulated to accentuate the corresponding texture characteristics exhibited there. After re-combining these manipulated components, the image is enhanced with the textural differences magnified with respect to the selected texture characteristics. The proposed textural differences enhancement method is used prior to both grayscale and colour image segmentation algorithms. The convincing results of improving the performance of different segmentation algorithms prove the potential of the proposed textural difference enhancement method

    Investigation of new learning methods for visual recognition

    Get PDF
    Visual recognition is one of the most difficult and prevailing problems in computer vision and pattern recognition due to the challenges in understanding the semantics and contents of digital images. Two major components of a visual recognition system are discriminatory feature representation and efficient and accurate pattern classification. This dissertation therefore focuses on developing new learning methods for visual recognition. Based on the conventional sparse representation, which shows its robustness for visual recognition problems, a series of new methods is proposed. Specifically, first, a new locally linear K nearest neighbor method, or LLK method, is presented. The LLK method derives a new representation, which is an approximation to the ideal representation, by optimizing an objective function based on a host of criteria for sparsity, locality, and reconstruction. The novel representation is further processed by two new classifiers, namely, an LLK based classifier (LLKc) and a locally linear nearest mean based classifier (LLNc), for visual recognition. The proposed classifiers are shown to connect to the Bayes decision rule for minimum error. Second, a new generative and discriminative sparse representation (GDSR) method is proposed by taking advantage of both a coarse modeling of the generative information and a modeling of the discriminative information. The proposed GDSR method integrates two new criteria, namely, a discriminative criterion and a generative criterion, into the conventional sparse representation criterion. A new generative and discriminative sparse representation based classification (GDSRc) method is then presented based on the derived new representation. Finally, a new Score space based multiple Metric Learning (SML) method is presented for a challenging visual recognition application, namely, recognizing kinship relations or kinship verification. The proposed SML method, which goes beyond the conventional Mahalanobis distance metric learning, not only learns the distance metric but also models the generative process of features by taking advantage of the score space. The SML method is optimized by solving a constrained, non-negative, and weighted variant of the sparse representation problem. To assess the feasibility of the proposed new learning methods, several visual recognition tasks, such as face recognition, scene recognition, object recognition, computational fine art analysis, action recognition, fine grained recognition, as well as kinship verification are applied. The experimental results show that the proposed new learning methods achieve better performance than the other popular methods

    A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity

    Full text link
    The richness of natural images makes the quest for optimal representations in image processing and computer vision challenging. The latter observation has not prevented the design of image representations, which trade off between efficiency and complexity, while achieving accurate rendering of smooth regions as well as reproducing faithful contours and textures. The most recent ones, proposed in the past decade, share an hybrid heritage highlighting the multiscale and oriented nature of edges and patterns in images. This paper presents a panorama of the aforementioned literature on decompositions in multiscale, multi-orientation bases or dictionaries. They typically exhibit redundancy to improve sparsity in the transformed domain and sometimes its invariance with respect to simple geometric deformations (translation, rotation). Oriented multiscale dictionaries extend traditional wavelet processing and may offer rotation invariance. Highly redundant dictionaries require specific algorithms to simplify the search for an efficient (sparse) representation. We also discuss the extension of multiscale geometric decompositions to non-Euclidean domains such as the sphere or arbitrary meshed surfaces. The etymology of panorama suggests an overview, based on a choice of partially overlapping "pictures". We hope that this paper will contribute to the appreciation and apprehension of a stream of current research directions in image understanding.Comment: 65 pages, 33 figures, 303 reference

    REPRESENTATION LEARNING FOR ACTION RECOGNITION

    Get PDF
    The objective of this research work is to develop discriminative representations for human actions. The motivation stems from the fact that there are many issues encountered while capturing actions in videos like intra-action variations (due to actors, viewpoints, and duration), inter-action similarity, background motion, and occlusion of actors. Hence, obtaining a representation which can address all the variations in the same action while maintaining discrimination with other actions is a challenging task. In literature, actions have been represented either using either low-level or high-level features. Low-level features describe the motion and appearance in small spatio-temporal volumes extracted from a video. Due to the limited space-time volume used for extracting low-level features, they are not able to account for viewpoint and actor variations or variable length actions. On the other hand, high-level features handle variations in actors, viewpoints, and duration but the resulting representation is often high-dimensional which introduces the curse of dimensionality. In this thesis, we propose new representations for describing actions by combining the advantages of both low-level and high-level features. Specifically, we investigate various linear and non-linear decomposition techniques to extract meaningful attributes in both high-level and low-level features. In the first approach, the sparsity of high-level feature descriptors is leveraged to build action-specific dictionaries. Each dictionary retains only the discriminative information for a particular action and hence reduces inter-action similarity. Then, a sparsity-based classification method is proposed to classify the low-rank representation of clips obtained using these dictionaries. We show that this representation based on dictionary learning improves the classification performance across actions. Also, a few of the actions consist of rapid body deformations that hinder the extraction of local features from body movements. Hence, we propose to use a dictionary which is trained on convolutional neural network (CNN) features of the human body in various poses to reliably identify actors from the background. Particularly, we demonstrate the efficacy of sparse representation in the identification of the human body under rapid and substantial deformation. In the first two approaches, sparsity-based representation is developed to improve discriminability using class-specific dictionaries that utilize action labels. However, developing an unsupervised representation of actions is more beneficial as it can be used to both recognize similar actions and localize actions. We propose to exploit inter-action similarity to train a universal attribute model (UAM) in order to learn action attributes (common and distinct) implicitly across all the actions. Using maximum aposteriori (MAP) adaptation, a high-dimensional super action-vector (SAV) for each clip is extracted. As this SAV contains redundant attributes of all other actions, we use factor analysis to extract a novel lowvi dimensional action-vector representation for each clip. Action-vectors are shown to suppress background motion and highlight actions of interest in both trimmed and untrimmed clips that contributes to action recognition without the help of any classifiers. It is observed during our experiments that action-vector cannot effectively discriminate between actions which are visually similar to each other. Hence, we subject action-vectors to supervised linear embedding using linear discriminant analysis (LDA) and probabilistic LDA (PLDA) to enforce discrimination. Particularly, we show that leveraging complimentary information across action-vectors using different local features followed by discriminative embedding provides the best classification performance. Further, we explore non-linear embedding of action-vectors using Siamese networks especially for fine-grained action recognition. A visualization of the hidden layer output in Siamese networks shows its ability to effectively separate visually similar actions. This leads to better classification performance than linear embedding on fine-grained action recognition. All of the above approaches are presented on large unconstrained datasets with hundreds of examples per action. However, actions in surveillance videos like snatch thefts are difficult to model because of the diverse variety of scenarios in which they occur and very few labeled examples. Hence, we propose to utilize the universal attribute model (UAM) trained on large action datasets to represent such actions. Specifically, we show that there are similarities between certain actions in the large datasets with snatch thefts which help in extracting a representation for snatch thefts using the attributes from the UAM. This representation is shown to be effective in distinguishing snatch thefts from regular actions with high accuracy.In summary, this thesis proposes both supervised and unsupervised approaches for representing actions which provide better discrimination than existing representations. The first approach presents a dictionary learning based sparse representation for effective discrimination of actions. Also, we propose a sparse representation for the human body based on dictionaries in order to recognize actions with rapid body deformations. In the next approach, a low-dimensional representation called action-vector for unsupervised action recognition is presented. Further, linear and non-linear embedding of action-vectors is proposed for addressing inter-action similarity and fine-grained action recognition, respectively. Finally, we propose a representation for locating snatch thefts among thousands of regular interactions in surveillance videos

    CELL PATTERN CLASSIFICATION OF INDIRECT IMMUNOFLUORESCENCE IMAGES

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore