51 research outputs found

    Video Event Recognition and Anomaly Detection by Combining Gaussian Process and Hierarchical Dirichlet Process Models

    Get PDF
    In this paper, we present an unsupervised learning framework for analyzing activities and interactions in surveillance videos. In our framework, three levels of video events are connected by Hierarchical Dirichlet Process (HDP) model: low-level visual features, simple atomic activities, and multi-agent interactions. Atomic activities are represented as distribution of low-level features, while complicated interactions are represented as distribution of atomic activities. This learning process is unsupervised. Given a training video sequence, low-level visual features are extracted based on optic flow and then clustered into different atomic activities and video clips are clustered into different interactions. The HDP model automatically decide the number of clusters, i.e. the categories of atomic activities and interactions. Based on the learned atomic activities and interactions, a training dataset is generated to train the Gaussian Process (GP) classifier. Then the trained GP models work in newly captured video to classify interactions and detect abnormal events in real time. Furthermore, the temporal dependencies between video events learned by HDP-Hidden Markov Models (HMM) are effectively integrated into GP classifier to enhance the accuracy of the classification in newly captured videos. Our framework couples the benefits of the generative model (HDP) with the discriminant model (GP). We provide detailed experiments showing that our framework enjoys favorable performance in video event classification in real-time in a crowded traffic scene

    Extensions to the Latent Dirichlet Allocation Topic Model Using Flexible Priors

    Get PDF
    Intrinsically, topic models have always their likelihood functions fixed to multinomial distributions as they operate on count data instead of Gaussian data. As a result, their performances ultimately depend on the flexibility of the chosen prior distributions when following the Bayesian paradigm compared to classical approaches such as PLSA (probabilistic latent semantic analysis), unigrams and mixture of unigrams that do not use prior information. The standard LDA (latent Dirichlet allocation) topic model operates with symmetric Dirichlet distribution (as a conjugate prior) which has been found to carry some limitations due to its independent structure that tends to hinder performance for instance in topic correlation including positively correlated data processing. Compared to classical ML estimators, the use of priors ultimately presents another unique advantage of smoothing out the multinomials while enhancing predictive topic models. In this thesis, we propose a series of flexible priors such as generalized Dirichlet (GD) and Beta-Liouville (BL) for our topic models within the collapsed representation, leading to much improved CVB (collapsed variational Bayes) update equations compared to ones from the standard LDA. This is because the flexibility of these priors improves significantly the lower bounds in the corresponding CVB algorithms. We also show the robustness of our proposed CVB inferences when using simultaneously the BL and GD in hybrid generative-discriminative models where the generative stage produces good and heterogeneous topic features that are used in the discriminative stage by powerful classifiers such as SVMs (support vector machines) as we propose efficient probabilistic kernels to facilitate processing (classification) of documents based on topic signatures. Doing so, we implicitly cast topic modeling which is an unsupervised learning method into a supervised learning technique. Furthermore, due to the complexity of the CVB algorithm (as it requires second order Taylor expansions) in general, despite its flexibility, we propose a much simpler and tractable update equation using a MAP (maximum a posteriori) framework with the standard EM (expectation-maximization) algorithm. As most Bayesian posteriors are not tractable for complex models, we ultimately propose the MAP-LBLA (latent BL allocation) where we characterize the contributions of asymmetric BL priors over the symmetric Dirichlet (Dir). The proposed MAP technique importantly offers a point estimate (mode) with a much tractable solution. In the MAP, we show that point estimate could be easy to implement than full Bayesian analysis that integrates over the entire parameter space. The MAP implicitly exhibits some equivalent relationship with the CVB especially the zero order approximations CVB0 and its stochastic version SCVB0. The proposed method enhances performances in information retrieval in text document analysis. We show that parametric topic models (as they are finite dimensional methods) have a much smaller hypothesis space and they generally suffer from model selection. We therefore propose a Bayesian nonparametric (BNP) technique that uses the Hierarchical Dirichlet process (HDP) as conjugate prior to the document multinomial distributions where the asymmetric BL serves as a diffuse (probability) base measure that provides the global atoms (topics) that are shared among documents. The heterogeneity in the topic structure helps in providing an alternative to model selection because the nonparametric topic model (which is infinite dimensional with a much bigger hypothesis space) could now prune out irrelevant topics based on the associated probability masses to only retain the most relevant ones. We also show that for large scale applications, stochastic optimizations using natural gradients of the objective functions have demonstrated significant performances when we learn rapidly both data and parameters in online fashion (streaming). We use both predictive likelihood and perplexity as evaluation methods to assess the robustness of our proposed topic models as we ultimately refer to probability as a way to quantify uncertainty in our Bayesian framework. We improve object categorization in terms of inferences through the flexibility of our prior distributions in the collapsed space. We also improve information retrieval technique with the MAP and the HDP-LBLA topic models while extending the standard LDA. These two applications present the ultimate capability of enhancing a search engine based on topic models

    Joint segmentation and activity discovery using semantic and temporal priors

    Full text link

    Explain what you see:argumentation-based learning and robotic vision

    Get PDF
    In this thesis, we have introduced new techniques for the problems of open-ended learning, online incremental learning, and explainable learning. These methods have applications in the classification of tabular data, 3D object category recognition, and 3D object parts segmentation. We have utilized argumentation theory and probability theory to develop these methods. The first proposed open-ended online incremental learning approach is Argumentation-Based online incremental Learning (ABL). ABL works with tabular data and can learn with a small number of learning instances using an abstract argumentation framework and bipolar argumentation framework. It has a higher learning speed than state-of-the-art online incremental techniques. However, it has high computational complexity. We have addressed this problem by introducing Accelerated Argumentation-Based Learning (AABL). AABL uses only an abstract argumentation framework and uses two strategies to accelerate the learning process and reduce the complexity. The second proposed open-ended online incremental learning approach is the Local Hierarchical Dirichlet Process (Local-HDP). Local-HDP aims at addressing two problems of open-ended category recognition of 3D objects and segmenting 3D object parts. We have utilized Local-HDP for the task of object part segmentation in combination with AABL to achieve an interpretable model to explain why a certain 3D object belongs to a certain category. The explanations of this model tell a user that a certain object has specific object parts that look like a set of the typical parts of certain categories. Moreover, integrating AABL and Local-HDP leads to a model that can handle a high degree of occlusion

    Graphical models for visual object recognition and tracking

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 277-301).We develop statistical methods which allow effective visual detection, categorization, and tracking of objects in complex scenes. Such computer vision systems must be robust to wide variations in object appearance, the often small size of training databases, and ambiguities induced by articulated or partially occluded objects. Graphical models provide a powerful framework for encoding the statistical structure of visual scenes, and developing corresponding learning and inference algorithms. In this thesis, we describe several models which integrate graphical representations with nonparametric statistical methods. This approach leads to inference algorithms which tractably recover high-dimensional, continuous object pose variations, and learning procedures which transfer knowledge among related recognition tasks. Motivated by visual tracking problems, we first develop a nonparametric extension of the belief propagation (BP) algorithm. Using Monte Carlo methods, we provide general procedures for recursively updating particle-based approximations of continuous sufficient statistics. Efficient multiscale sampling methods then allow this nonparametric BP algorithm to be flexibly adapted to many different applications.(cont.) As a particular example, we consider a graphical model describing the hand's three-dimensional (3D) structure, kinematics, and dynamics. This graph encodes global hand pose via the 3D position and orientation of several rigid components, and thus exposes local structure in a high-dimensional articulated model. Applying nonparametric BP, we recover a hand tracking algorithm which is robust to outliers and local visual ambiguities. Via a set of latent occupancy masks, we also extend our approach to consistently infer occlusion events in a distributed fashion. In the second half of this thesis, we develop methods for learning hierarchical models of objects, the parts composing them, and the scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves accuracy when learning from few examples.(cont.) Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. Adapting these transformed Dirichlet processes to images taken with a binocular stereo camera, we learn integrated, 3D models of object geometry and appearance. This leads to a Monte Carlo algorithm which automatically infers 3D scene structure from the predictable geometry of known object categories.by Erik B. Sudderth.Ph.D

    Using topic models to detect behaviour patterns for healthcare monitoring

    Get PDF
    Healthcare systems worldwide are facing growing demands on their resources due to an ageing population and increase in prevalence of chronic diseases. Innovative residential healthcare monitoring systems, using a variety of sensors are being developed to help address these needs. Interpreting the vast wealth of data generated is key to fully exploiting the benefits offered by a monitoring system. This thesis presents the application of topic models, a machine learning algorithm, to detect behaviour patterns in different types of data produced by a monitoring system. Latent Dirichlet Allocation was applied to real world activity data with corresponding ground truth labels of daily routines. The results from an existing dataset and a novel dataset collected using a custom mobile phone app, demonstrated that the patterns found are equivalent of routines. Long term monitoring can identify changes that could indicate an alteration in health status. Dynamic topic models were applied to simulated long term activity datasets to detect changes in the structure of daily routines. It was shown that the changes occurring in the simulated data can successfully be detected. This result suggests potential for dynamic topic models to identify changes in routines that could aid early diagnosis of chronic diseases. Furthermore, chronic conditions, such as diabetes and obesity, are related to quality of diet. Current research findings on the association between eating behaviours, especially snacking, and the impact on diet quality and health are often conflicting. One problem is the lack of consistent definitions for different types of eating event. The novel application of Latent Dirichlet Allocation to three nutrition datasets is described. The results demonstrated that combinations of food groups representative of eating event types can be detected. Moreover, labels assigned to these combinations showed good agreement with alternative methods for labelling eating event types

    Dataset shift in land-use classification for optical remote sensing

    Get PDF
    Multimodal dataset shifts consisting of both concept and covariate shifts are addressed in this study to improve texture-based land-use classification accuracy for optical panchromatic and multispectral remote sensing. Multitemporal and multisensor variances between train and test data are caused by atmospheric, phenological, sensor, illumination and viewing geometry differences, which cause supervised classification inaccuracies. The first dataset shift reduction strategy involves input modification through shadow removal before feature extraction with gray-level co-occurrence matrix and local binary pattern features. Components of a Rayleigh quotient-based manifold alignment framework is investigated to reduce multimodal dataset shift at the input level of the classifier through unsupervised classification, followed by manifold matching to transfer classification labels by finding across-domain cluster correspondences. The ability of weighted hierarchical agglomerative clustering to partition poorly separated feature spaces is explored and weight-generalized internal validation is used for unsupervised cardinality determination. Manifold matching solves the Hungarian algorithm with a cost matrix featuring geometric similarity measurements that assume the preservation of intrinsic structure across the dataset shift. Local neighborhood geometric co-occurrence frequency information is recovered and a novel integration thereof is shown to improve matching accuracy. A final strategy for addressing multimodal dataset shift is multiscale feature learning, which is used within a convolutional neural network to obtain optimal hierarchical feature representations instead of engineered texture features that may be sub-optimal. Feature learning is shown to produce features that are robust against multimodal acquisition differences in a benchmark land-use classification dataset. A novel multiscale input strategy is proposed for an optimized convolutional neural network that improves classification accuracy to a competitive level for the UC Merced benchmark dataset and outperforms single-scale input methods. All the proposed strategies for addressing multimodal dataset shift in land-use image classification have resulted in significant accuracy improvements for various multitemporal and multimodal datasets.Thesis (PhD)--University of Pretoria, 2016.National Research Foundation (NRF)University of Pretoria (UP)Electrical, Electronic and Computer EngineeringPhDUnrestricte

    Ohjaamaton koneoppiminen tapahtumakategorisoinnissa liiketoimintatiedon hyödyntämisessä

    Get PDF
    The data and information available for business intelligence purposes is increasing rapidly in the world. Data quality and quantity are important for making the correct business decisions, but the amount of data is becoming difficult to process. Different machine learning methods are becoming an increasingly powerful tool to deal with the amount of data. One such machine learning approach is the automatic annotation and location of business intelligence relevant actions and events in news data. While studying the literature of this field, it however became clear, that there exists little standardization and objectivity regarding what types of categories these events and actions are sorted into. This was often done in subjective, arduous manners. The goal of this thesis is to provide information and recommendations on how to create more objective, less time consuming initial categorizations of actions and events by studying different common unsupervised learning methods for this task. The relevant literature and theory to understand the followed research and methodology is studied. The context and evolution of business intelligence to today is considered, and specially its relationship to the big data problem of today is studied. This again relates to the fields of machine learning, artificial intelligence, and especially natural language programming. The relevant methods of these fields are covered to understand the taken steps to achieve the goal of this thesis. All approaches aided in understanding the behaviour of unsupervised learning methods, and how it should taken into account in the categorization creation. Different natural language preprocessing steps are combined with different text vectorization methods. Specifically, three different text tokenization methods, plain, N-gram, and chunk tokenizations are tested with two popular vectorization methods: bag-of-words and term frequency inverse document frequency vectorizations. Two types of unsupervised methods are tested for these vectorizations: Clustering is a more traditional data subcategorization process, and topic modelling is a fuzzy, probability based method for the same task. Out of both learning methods, three different algorithms are studied by the interpretability and categorization value of their top cluster or topic representative terms. The top term representations are also compared to the true contents of these topics or clusters via content analysis. Out of the studied methods, plain and chunk tokenization methods yielded the most comprehensible results to a human reader. Vectorization made no major difference regarding top term interpretability or contents and top term congruence. Out of the methods studied, K-means clustering and Latent Dirichlet Allocation were deemed the most useful in event and action categorization creation. K-means clustering created a good basis for an initial categorization framework with congruent result top terms to the contents of the clusters, and Latent Dirichlet Allocation found latent topics in the text documents that provided serendipitous, fruitful insights for a category creator to take into account

    Riemannian Flows for Supervised and Unsupervised Geometric Image Labeling

    Get PDF
    In this thesis we focus on the image labeling problem, which is used as a subroutine in many image processing applications. Our work is based on the assignment flow which was recently introduced as a novel geometric approach to the image labeling problem. This flow evolves over time on the manifold of row-stochastic matrices, whose elements represent label assignments as assignment probabilities. The strict separation of assignment manifold and feature space enables the data to lie in any metric space, while a smoothing operation on the assignment manifold results in an unbiased and spatially regularized labeling. The first part of this work focuses on theoretical statements about the asymptotic behavior of the assignment flow. We show under weak assumptions on the parameters that the assignment flow for data in general position converges towards integral probabilities and thus ensures unique assignment decisions. Furthermore, we investigate the stability of possible limit points depending on the input data and parameters. For stable limits, we derive conditions that allow early evidence of convergence towards these limits and thus provide convergence guarantees. In the second part, we extend the assignment flow approach in order to impose global convex constraints on the labeling results based on linear filter statistics of the assignments. The corresponding filters are learned from examples using an eigendecomposition. The effectiveness of the approach is numerically demonstrated in several academic labeling scenarios. In the last part of this thesis we consider the situation in which no labels are given and therefore these prototypical elements have to be determined from the data as well. To this end we introduce an additional flow on the feature manifold, which is coupled to the assignment flow. The resulting flow adapts the prototypes in time to the assignment probabilities. The simultaneous adaptation and assignment of prototypes not only provides suitable prototypes, but also improves the resulting image segmentation, which is demonstrated by experiments. For this approach it is assumed that the data lie on a Riemannian manifold. We elaborate the approach for a range of manifolds that occur in applications and evaluate the resulting approaches in numerical experiments

    The analysis of bodily gestures in response to music : methods for embodied music cognition based on machine learning

    Get PDF
    • …
    corecore