61 research outputs found

    Statistical and Dynamical Modeling of Riemannian Trajectories with Application to Human Movement Analysis

    Get PDF
    abstract: The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Discriminant feature pursuit: from statistical learning to informative learning.

    Get PDF
    Lin Dahua.Thesis (M.Phil.)--Chinese University of Hong Kong, 2006.Includes bibliographical references (leaves 233-250).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- The Problem We are Facing --- p.1Chapter 1.2 --- Generative vs. Discriminative Models --- p.2Chapter 1.3 --- Statistical Feature Extraction: Success and Challenge --- p.3Chapter 1.4 --- Overview of Our Works --- p.5Chapter 1.4.1 --- New Linear Discriminant Methods: Generalized LDA Formulation and Performance-Driven Sub space Learning --- p.5Chapter 1.4.2 --- Coupled Learning Models: Coupled Space Learning and Inter Modality Recognition --- p.6Chapter 1.4.3 --- Informative Learning Approaches: Conditional Infomax Learning and Information Chan- nel Model --- p.6Chapter 1.5 --- Organization of the Thesis --- p.8Chapter I --- History and Background --- p.10Chapter 2 --- Statistical Pattern Recognition --- p.11Chapter 2.1 --- Patterns and Classifiers --- p.11Chapter 2.2 --- Bayes Theory --- p.12Chapter 2.3 --- Statistical Modeling --- p.14Chapter 2.3.1 --- Maximum Likelihood Estimation --- p.14Chapter 2.3.2 --- Gaussian Model --- p.15Chapter 2.3.3 --- Expectation-Maximization --- p.17Chapter 2.3.4 --- Finite Mixture Model --- p.18Chapter 2.3.5 --- A Nonparametric Technique: Parzen Windows --- p.21Chapter 3 --- Statistical Learning Theory --- p.24Chapter 3.1 --- Formulation of Learning Model --- p.24Chapter 3.1.1 --- Learning: Functional Estimation Model --- p.24Chapter 3.1.2 --- Representative Learning Problems --- p.25Chapter 3.1.3 --- Empirical Risk Minimization --- p.26Chapter 3.2 --- Consistency and Convergence of Learning --- p.27Chapter 3.2.1 --- Concept of Consistency --- p.27Chapter 3.2.2 --- The Key Theorem of Learning Theory --- p.28Chapter 3.2.3 --- VC Entropy --- p.29Chapter 3.2.4 --- Bounds on Convergence --- p.30Chapter 3.2.5 --- VC Dimension --- p.35Chapter 4 --- History of Statistical Feature Extraction --- p.38Chapter 4.1 --- Linear Feature Extraction --- p.38Chapter 4.1.1 --- Principal Component Analysis (PCA) --- p.38Chapter 4.1.2 --- Linear Discriminant Analysis (LDA) --- p.41Chapter 4.1.3 --- Other Linear Feature Extraction Methods --- p.46Chapter 4.1.4 --- Comparison of Different Methods --- p.48Chapter 4.2 --- Enhanced Models --- p.49Chapter 4.2.1 --- Stochastic Discrimination and Random Subspace --- p.49Chapter 4.2.2 --- Hierarchical Feature Extraction --- p.51Chapter 4.2.3 --- Multilinear Analysis and Tensor-based Representation --- p.52Chapter 4.3 --- Nonlinear Feature Extraction --- p.54Chapter 4.3.1 --- Kernelization --- p.54Chapter 4.3.2 --- Dimension reduction by Manifold Embedding --- p.56Chapter 5 --- Related Works in Feature Extraction --- p.59Chapter 5.1 --- Dimension Reduction --- p.59Chapter 5.1.1 --- Feature Selection --- p.60Chapter 5.1.2 --- Feature Extraction --- p.60Chapter 5.2 --- Kernel Learning --- p.61Chapter 5.2.1 --- Basic Concepts of Kernel --- p.61Chapter 5.2.2 --- The Reproducing Kernel Map --- p.62Chapter 5.2.3 --- The Mercer Kernel Map --- p.64Chapter 5.2.4 --- The Empirical Kernel Map --- p.65Chapter 5.2.5 --- Kernel Trick and Kernelized Feature Extraction --- p.66Chapter 5.3 --- Subspace Analysis --- p.68Chapter 5.3.1 --- Basis and Subspace --- p.68Chapter 5.3.2 --- Orthogonal Projection --- p.69Chapter 5.3.3 --- Orthonormal Basis --- p.70Chapter 5.3.4 --- Subspace Decomposition --- p.70Chapter 5.4 --- Principal Component Analysis --- p.73Chapter 5.4.1 --- PCA Formulation --- p.73Chapter 5.4.2 --- Solution to PCA --- p.75Chapter 5.4.3 --- Energy Structure of PCA --- p.76Chapter 5.4.4 --- Probabilistic Principal Component Analysis --- p.78Chapter 5.4.5 --- Kernel Principal Component Analysis --- p.81Chapter 5.5 --- Independent Component Analysis --- p.83Chapter 5.5.1 --- ICA Formulation --- p.83Chapter 5.5.2 --- Measurement of Statistical Independence --- p.84Chapter 5.6 --- Linear Discriminant Analysis --- p.85Chapter 5.6.1 --- Fisher's Linear Discriminant Analysis --- p.85Chapter 5.6.2 --- Improved Algorithms for Small Sample Size Problem . --- p.89Chapter 5.6.3 --- Kernel Discriminant Analysis --- p.92Chapter II --- Improvement in Linear Discriminant Analysis --- p.100Chapter 6 --- Generalized LDA --- p.101Chapter 6.1 --- Regularized LDA --- p.101Chapter 6.1.1 --- Generalized LDA Implementation Procedure --- p.101Chapter 6.1.2 --- Optimal Nonsingular Approximation --- p.103Chapter 6.1.3 --- Regularized LDA algorithm --- p.104Chapter 6.2 --- A Statistical View: When is LDA optimal? --- p.105Chapter 6.2.1 --- Two-class Gaussian Case --- p.106Chapter 6.2.2 --- Multi-class Cases --- p.107Chapter 6.3 --- Generalized LDA Formulation --- p.108Chapter 6.3.1 --- Mathematical Preparation --- p.108Chapter 6.3.2 --- Generalized Formulation --- p.110Chapter 7 --- Dynamic Feedback Generalized LDA --- p.112Chapter 7.1 --- Basic Principle --- p.112Chapter 7.2 --- Dynamic Feedback Framework --- p.113Chapter 7.2.1 --- Initialization: K-Nearest Construction --- p.113Chapter 7.2.2 --- Dynamic Procedure --- p.115Chapter 7.3 --- Experiments --- p.115Chapter 7.3.1 --- Performance in Training Stage --- p.116Chapter 7.3.2 --- Performance on Testing set --- p.118Chapter 8 --- Performance-Driven Subspace Learning --- p.119Chapter 8.1 --- Motivation and Principle --- p.119Chapter 8.2 --- Performance-Based Criteria --- p.121Chapter 8.2.1 --- The Verification Problem and Generalized Average Margin --- p.122Chapter 8.2.2 --- Performance Driven Criteria based on Generalized Average Margin --- p.123Chapter 8.3 --- Optimal Subspace Pursuit --- p.125Chapter 8.3.1 --- Optimal threshold --- p.125Chapter 8.3.2 --- Optimal projection matrix --- p.125Chapter 8.3.3 --- Overall procedure --- p.129Chapter 8.3.4 --- Discussion of the Algorithm --- p.129Chapter 8.4 --- Optimal Classifier Fusion --- p.130Chapter 8.5 --- Experiments --- p.131Chapter 8.5.1 --- Performance Measurement --- p.131Chapter 8.5.2 --- Experiment Setting --- p.131Chapter 8.5.3 --- Experiment Results --- p.133Chapter 8.5.4 --- Discussion --- p.139Chapter III --- Coupled Learning of Feature Transforms --- p.140Chapter 9 --- Coupled Space Learning --- p.141Chapter 9.1 --- Introduction --- p.142Chapter 9.1.1 --- What is Image Style Transform --- p.142Chapter 9.1.2 --- Overview of our Framework --- p.143Chapter 9.2 --- Coupled Space Learning --- p.143Chapter 9.2.1 --- Framework of Coupled Modelling --- p.143Chapter 9.2.2 --- Correlative Component Analysis --- p.145Chapter 9.2.3 --- Coupled Bidirectional Transform --- p.148Chapter 9.2.4 --- Procedure of Coupled Space Learning --- p.151Chapter 9.3 --- Generalization to Mixture Model --- p.152Chapter 9.3.1 --- Coupled Gaussian Mixture Model --- p.152Chapter 9.3.2 --- Optimization by EM Algorithm --- p.152Chapter 9.4 --- Integrated Framework for Image Style Transform --- p.154Chapter 9.5 --- Experiments --- p.156Chapter 9.5.1 --- Face Super-resolution --- p.156Chapter 9.5.2 --- Portrait Style Transforms --- p.157Chapter 10 --- Inter-Modality Recognition --- p.162Chapter 10.1 --- Introduction to the Inter-Modality Recognition Problem . . . --- p.163Chapter 10.1.1 --- What is Inter-Modality Recognition --- p.163Chapter 10.1.2 --- Overview of Our Feature Extraction Framework . . . . --- p.163Chapter 10.2 --- Common Discriminant Feature Extraction --- p.165Chapter 10.2.1 --- Formulation of the Learning Problem --- p.165Chapter 10.2.2 --- Matrix-Form of the Objective --- p.168Chapter 10.2.3 --- Solving the Linear Transforms --- p.169Chapter 10.3 --- Kernelized Common Discriminant Feature Extraction --- p.170Chapter 10.4 --- Multi-Mode Framework --- p.172Chapter 10.4.1 --- Multi-Mode Formulation --- p.172Chapter 10.4.2 --- Optimization Scheme --- p.174Chapter 10.5 --- Experiments --- p.176Chapter 10.5.1 --- Experiment Settings --- p.176Chapter 10.5.2 --- Experiment Results --- p.177Chapter IV --- A New Perspective: Informative Learning --- p.180Chapter 11 --- Toward Information Theory --- p.181Chapter 11.1 --- Entropy and Mutual Information --- p.181Chapter 11.1.1 --- Entropy --- p.182Chapter 11.1.2 --- Relative Entropy (Kullback Leibler Divergence) --- p.184Chapter 11.2 --- Mutual Information --- p.184Chapter 11.2.1 --- Definition of Mutual Information --- p.184Chapter 11.2.2 --- Chain rules --- p.186Chapter 11.2.3 --- Information in Data Processing --- p.188Chapter 11.3 --- Differential Entropy --- p.189Chapter 11.3.1 --- Differential Entropy of Continuous Random Variable . --- p.189Chapter 11.3.2 --- Mutual Information of Continuous Random Variable . --- p.190Chapter 12 --- Conditional Infomax Learning --- p.191Chapter 12.1 --- An Overview --- p.192Chapter 12.2 --- Conditional Informative Feature Extraction --- p.193Chapter 12.2.1 --- Problem Formulation and Features --- p.193Chapter 12.2.2 --- The Information Maximization Principle --- p.194Chapter 12.2.3 --- The Information Decomposition and the Conditional Objective --- p.195Chapter 12.3 --- The Efficient Optimization --- p.197Chapter 12.3.1 --- Discrete Approximation Based on AEP --- p.197Chapter 12.3.2 --- Analysis of Terms and Their Derivatives --- p.198Chapter 12.3.3 --- Local Active Region Method --- p.200Chapter 12.4 --- Bayesian Feature Fusion with Sparse Prior --- p.201Chapter 12.5 --- The Integrated Framework for Feature Learning --- p.202Chapter 12.6 --- Experiments --- p.203Chapter 12.6.1 --- A Toy Problem --- p.203Chapter 12.6.2 --- Face Recognition --- p.204Chapter 13 --- Channel-based Maximum Effective Information --- p.209Chapter 13.1 --- Motivation and Overview --- p.209Chapter 13.2 --- Maximizing Effective Information --- p.211Chapter 13.2.1 --- Relation between Mutual Information and Classification --- p.211Chapter 13.2.2 --- Linear Projection and Metric --- p.212Chapter 13.2.3 --- Channel Model and Effective Information --- p.213Chapter 13.2.4 --- Parzen Window Approximation --- p.216Chapter 13.3 --- Parameter Optimization on Grassmann Manifold --- p.217Chapter 13.3.1 --- Grassmann Manifold --- p.217Chapter 13.3.2 --- Conjugate Gradient Optimization on Grassmann Manifold --- p.219Chapter 13.3.3 --- Computation of Gradient --- p.221Chapter 13.4 --- Experiments --- p.222Chapter 13.4.1 --- A Toy Problem --- p.222Chapter 13.4.2 --- Face Recognition --- p.223Chapter 14 --- Conclusion --- p.23

    Action analysis and video summarisation to efficiently manage and interpret video data

    Get PDF

    Riemannian Multi-Manifold Modeling

    Full text link
    This paper advocates a novel framework for segmenting a dataset in a Riemannian manifold MM into clusters lying around low-dimensional submanifolds of MM. Important examples of MM, for which the proposed clustering algorithm is computationally efficient, are the sphere, the set of positive definite matrices, and the Grassmannian. The clustering problem with these examples of MM is already useful for numerous application domains such as action identification in video sequences, dynamic texture clustering, brain fiber segmentation in medical imaging, and clustering of deformed images. The proposed clustering algorithm constructs a data-affinity matrix by thoroughly exploiting the intrinsic geometry and then applies spectral clustering. The intrinsic local geometry is encoded by local sparse coding and more importantly by directional information of local tangent spaces and geodesics. Theoretical guarantees are established for a simplified variant of the algorithm even when the clusters intersect. To avoid complication, these guarantees assume that the underlying submanifolds are geodesic. Extensive validation on synthetic and real data demonstrates the resiliency of the proposed method against deviations from the theoretical model as well as its superior performance over state-of-the-art techniques

    Improving Representation Learning for Deep Clustering and Few-shot Learning

    Get PDF
    The amounts of data in the world have increased dramatically in recent years, and it is quickly becoming infeasible for humans to label all these data. It is therefore crucial that modern machine learning systems can operate with few or no labels. The introduction of deep learning and deep neural networks has led to impressive advancements in several areas of machine learning. These advancements are largely due to the unprecedented ability of deep neural networks to learn powerful representations from a wide range of complex input signals. This ability is especially important when labeled data is limited, as the absence of a strong supervisory signal forces models to rely more on intrinsic properties of the data and its representations. This thesis focuses on two key concepts in deep learning with few or no labels. First, we aim to improve representation quality in deep clustering - both for single-view and multi-view data. Current models for deep clustering face challenges related to properly representing semantic similarities, which is crucial for the models to discover meaningful clusterings. This is especially challenging with multi-view data, since the information required for successful clustering might be scattered across many views. Second, we focus on few-shot learning, and how geometrical properties of representations influence few-shot classification performance. We find that a large number of recent methods for few-shot learning embed representations on the hypersphere. Hence, we seek to understand what makes the hypersphere a particularly suitable embedding space for few-shot learning. Our work on single-view deep clustering addresses the susceptibility of deep clustering models to find trivial solutions with non-meaningful representations. To address this issue, we present a new auxiliary objective that - when compared to the popular autoencoder-based approach - better aligns with the main clustering objective, resulting in improved clustering performance. Similarly, our work on multi-view clustering focuses on how representations can be learned from multi-view data, in order to make the representations suitable for the clustering objective. Where recent methods for deep multi-view clustering have focused on aligning view-specific representations, we find that this alignment procedure might actually be detrimental to representation quality. We investigate the effects of representation alignment, and provide novel insights on when alignment is beneficial, and when it is not. Based on our findings, we present several new methods for deep multi-view clustering - both alignment and non-alignment-based - that out-perform current state-of-the-art methods. Our first work on few-shot learning aims to tackle the hubness problem, which has been shown to have negative effects on few-shot classification performance. To this end, we present two new methods to embed representations on the hypersphere for few-shot learning. Further, we provide both theoretical and experimental evidence indicating that embedding representations as uniformly as possible on the hypersphere reduces hubness, and improves classification accuracy. Furthermore, based on our findings on hyperspherical embeddings for few-shot learning, we seek to improve the understanding of representation norms. In particular, we ask what type of information the norm carries, and why it is often beneficial to discard the norm in classification models. We answer this question by presenting a novel hypothesis on the relationship between representation norm and the number of a certain class of objects in the image. We then analyze our hypothesis both theoretically and experimentally, presenting promising results that corroborate the hypothesis

    A framework of face recognition with set of testing images

    Get PDF
    We propose a novel framework to solve the face recognition problem base on set of testing images. Our framework can handle the case that no pose overlap between training set and query set. The main techniques used in this framework are manifold alignment, face normalization and discriminant learning. Experiments on different databases show our system outperforms some state of the art methods

    Adaptation in Deep Learning Models: Algorithms and Applications

    Get PDF
    Artificial intelligence has been successful to match or even surpass human abilities e.g., recognizing images, playing games, and understanding languages. At the current state, powerful machine learning models learn from data under a stationary environment while humans are capable of learning in dynamic, changing, and sequential conditions. In pursuing the idea of open-ended learning for machine intelligence, we contribute to provide algorithms and analyses for generally capable models via adaptation. In this thesis, the model adaptation problem is defined as the impediment of intelligent machines to learn to modify their behaviors for new purposes or new uses. The ultimate goal is to develop machine intelligence that has the ability to adapt itself by not only following at our behest but also understanding the environment. Our works populate in the area of deep neural networks and transfer learning. Throughout our works, developing adaptive models is divided into four major problems: (1) few-shot learning, (2) fast model adaptation, (3) continual learning, and (4) architecture search. In few-shot learning, a model is expected to change its behavior when facing a new context or an unseen task with limited data. Another important problem within few-shot learning is to adapt quickly from a few data. In the problem of continual learning, the model needs to adapt sequentially depending on the given task. In architecture search, we look for a high-performing configuration for connecting among nodes in a model. To approach the problem of few-shot learning, we opt to use the strategy in transfer learning with a pretrained Convolutional Neural Network (CNN) for novel tasks with limited-data annotations. Inspired by the success of subspace methods for visual recognition, we develop a classifier using subspaces to improve the generalization capability to novel concepts. We also investigate few-shot learning in multi-label classification, and propose a multi-label propagation technique by constructing a graph from the representations of support samples. In pursuing fast model adaptation, we use the idea of preconditioners in optimization. Specifically, the problem revolves in \textit{meta-learning}, where the agent needs to learn a family of tasks and adapt quickly to a new task. Our algorithm uses a non-linear function to generate the preconditioner for modulating the gradient when updating the model. Our experiments show that the model converges more quickly than other types of preconditioners in the same problem. In the problem of continual learning, the model needs to sequentially learn and adapt the network parameters for new tasks without forgetting the previously learned tasks. To this end, we investigate the knowledge distillation approach, where the old model guides the current model to find the balance between the current task and the prior tasks. Our approach models the smoothness between two tasks using the geodesic flow, and the objective is to maximize similarity of the projected responses along the geodesic flow. In neural architecture search, the optimal architecture depends on the task objectives. We observe that searching for an optimal architecture is not trivial while the data annotations is noisy. The study investigates the impact of label noise in obtaining the best performance when optimizing a neural architecture, while also reducing the performance deterioration because of overfitting to noisy labels. We use the mutual information bottleneck to design a noise injection module that can alleviate the impact of learning under label noise. In summary, our works in this thesis address some major problems in model adaptation e.g., few-shot learning, meta-learning, continual learning, and neural architecture search. The solutions are expected to contribute to the arsenal of model adaptation algorithms and the analyses shed light on the essential aspects in adaptation strategies

    NON-LINEAR AND SPARSE REPRESENTATIONS FOR MULTI-MODAL RECOGNITION

    Get PDF
    In the first part of this dissertation, we address the problem of representing 2D and 3D shapes. In particular, we introduce a novel implicit shape representation based on Support Vector Machine (SVM) theory. Each shape is represented by an analytic decision function obtained by training an SVM, with a Radial Basis Function (RBF) kernel, so that the interior shape points are given higher values. This empowers support vector shape (SVS) with multifold advantages. First, the representation uses a sparse subset of feature points determined by the support vectors, which significantly improves the discriminative power against noise, fragmentation and other artifacts that often come with the data. Second, the use of the RBF kernel provides scale, rotation, and translation invariant features, and allows a shape to be represented accurately regardless of its complexity. Finally, the decision function can be used to select reliable feature points. These features are described using gradients computed from highly consistent decision functions instead of conventional edges. Our experiments on 2D and 3D shapes demonstrate promising results. The availability of inexpensive 3D sensors like Kinect necessitates the design of new representation for this type of data. We present a 3D feature descriptor that represents local topologies within a set of folded concentric rings by distances from local points to a projection plane. This feature, called as Concentric Ring Signature (CORS), possesses similar computational advantages to point signatures yet provides more accurate matches. CORS produces compact and discriminative descriptors, which makes it more robust to noise and occlusions. It is also well-known to computer vision researchers that there is no universal representation that is optimal for all types of data or tasks. Sparsity has proved to be a good criterion for working with natural images. This motivates us to develop efficient sparse and non-linear learning techniques for automatically extracting useful information from visual data. Specifically, we present dictionary learning methods for sparse and redundant representations in a high-dimensional feature space. Using the kernel method, we describe how the well-known dictionary learning approaches such as the method of optimal directions and KSVD can be made non-linear. We analyse their kernel constructions and demonstrate their effectiveness through several experiments on classification problems. It is shown that non-linear dictionary learning approaches can provide significantly better discrimination compared to their linear counterparts and kernel PCA, especially when the data is corrupted by different types of degradations. Visual descriptors are often high dimensional. This results in high computational complexity for sparse learning algorithms. Motivated by this observation, we introduce a novel framework, called sparse embedding (SE), for simultaneous dimensionality reduction and dictionary learning. We formulate an optimization problem for learning a transformation from the original signal domain to a lower-dimensional one in a way that preserves the sparse structure of data. We propose an efficient optimization algorithm and present its non-linear extension based on the kernel methods. One of the key features of our method is that it is computationally efficient as the learning is done in the lower-dimensional space and it discards the irrelevant part of the signal that derails the dictionary learning process. Various experiments show that our method is able to capture the meaningful structure of data and can perform significantly better than many competitive algorithms on signal recovery and object classification tasks. In many practical applications, we are often confronted with the situation where the data that we use to train our models are different from that presented during the testing. In the final part of this dissertation, we present a novel framework for domain adaptation using a sparse and hierarchical network (DASH-N), which makes use of the old data to improve the performance of a system operating on a new domain. Our network jointly learns a hierarchy of features together with transformations that rectify the mismatch between different domains. The building block of DASH-N is the latent sparse representation. It employs a dimensionality reduction step that can prevent the data dimension from increasing too fast as traversing deeper into the hierarchy. Experimental results show that our method consistently outperforms the current state-of-the-art by a significant margin. Moreover, we found that a multi-layer {DASH-N} has an edge over the single-layer DASH-N
    • …
    corecore