920 research outputs found

    Principal Boundary on Riemannian Manifolds

    Full text link
    We consider the classification problem and focus on nonlinear methods for classification on manifolds. For multivariate datasets lying on an embedded nonlinear Riemannian manifold within the higher-dimensional ambient space, we aim to acquire a classification boundary for the classes with labels, using the intrinsic metric on the manifolds. Motivated by finding an optimal boundary between the two classes, we invent a novel approach -- the principal boundary. From the perspective of classification, the principal boundary is defined as an optimal curve that moves in between the principal flows traced out from two classes of data, and at any point on the boundary, it maximizes the margin between the two classes. We estimate the boundary in quality with its direction, supervised by the two principal flows. We show that the principal boundary yields the usual decision boundary found by the support vector machine in the sense that locally, the two boundaries coincide. Some optimality and convergence properties of the random principal boundary and its population counterpart are also shown. We illustrate how to find, use and interpret the principal boundary with an application in real data.Comment: 31 pages,10 figure

    Towards On-line Domain-Independent Big Data Learning: Novel Theories and Applications

    Get PDF
    Feature extraction is an extremely important pre-processing step to pattern recognition, and machine learning problems. This thesis highlights how one can best extract features from the data in an exhaustively online and purely adaptive manner. The solution to this problem is given for both labeled and unlabeled datasets, by presenting a number of novel on-line learning approaches. Specifically, the differential equation method for solving the generalized eigenvalue problem is used to derive a number of novel machine learning and feature extraction algorithms. The incremental eigen-solution method is used to derive a novel incremental extension of linear discriminant analysis (LDA). Further the proposed incremental version is combined with extreme learning machine (ELM) in which the ELM is used as a preprocessor before learning. In this first key contribution, the dynamic random expansion characteristic of ELM is combined with the proposed incremental LDA technique, and shown to offer a significant improvement in maximizing the discrimination between points in two different classes, while minimizing the distance within each class, in comparison with other standard state-of-the-art incremental and batch techniques. In the second contribution, the differential equation method for solving the generalized eigenvalue problem is used to derive a novel state-of-the-art purely incremental version of slow feature analysis (SLA) algorithm, termed the generalized eigenvalue based slow feature analysis (GENEIGSFA) technique. Further the time series expansion of echo state network (ESN) and radial basis functions (EBF) are used as a pre-processor before learning. In addition, the higher order derivatives are used as a smoothing constraint in the output signal. Finally, an online extension of the generalized eigenvalue problem, derived from James Stone’s criterion, is tested, evaluated and compared with the standard batch version of the slow feature analysis technique, to demonstrate its comparative effectiveness. In the third contribution, light-weight extensions of the statistical technique known as canonical correlation analysis (CCA) for both twinned and multiple data streams, are derived by using the same existing method of solving the generalized eigenvalue problem. Further the proposed method is enhanced by maximizing the covariance between data streams while simultaneously maximizing the rate of change of variances within each data stream. A recurrent set of connections used by ESN are used as a pre-processor between the inputs and the canonical projections in order to capture shared temporal information in two or more data streams. A solution to the problem of identifying a low dimensional manifold on a high dimensional dataspace is then presented in an incremental and adaptive manner. Finally, an online locally optimized extension of Laplacian Eigenmaps is derived termed the generalized incremental laplacian eigenmaps technique (GENILE). Apart from exploiting the benefit of the incremental nature of the proposed manifold based dimensionality reduction technique, most of the time the projections produced by this method are shown to produce a better classification accuracy in comparison with standard batch versions of these techniques - on both artificial and real datasets

    Balancing between the Local and Global Structures (LGS) in Graph Embedding

    Full text link
    We present a method for balancing between the Local and Global Structures (LGS) in graph embedding, via a tunable parameter. Some embedding methods aim to capture global structures, while others attempt to preserve local neighborhoods. Few methods attempt to do both, and it is not always possible to capture well both local and global information in two dimensions, which is where most graph drawing live. The choice of using a local or a global embedding for visualization depends not only on the task but also on the structure of the underlying data, which may not be known in advance. For a given graph, LGS aims to find a good balance between the local and global structure to preserve. We evaluate the performance of LGS with synthetic and real-world datasets and our results indicate that it is competitive with the state-of-the-art methods, using established quality metrics such as stress and neighborhood preservation. We introduce a novel quality metric, cluster distance preservation, to assess intermediate structure capture. All source-code, datasets, experiments and analysis are available online.Comment: Appears in the Proceedings of the 31st International Symposium on Graph Drawing and Network Visualization (GD 2023

    On Motion Parameterizations in Image Sequences from Fixed Viewpoints

    Get PDF
    This dissertation addresses the problem of parameterizing object motion within a set of images taken with a stationary camera. We develop data-driven methods across all image scales: characterizing motion observed at the scale of individual pixels, along extended structures such as roads, and whole image deformations such as lungs deforming over time. The primary contributions include: a) fundamental studies of the relationship between spatio-temporal image derivatives accumulated at a pixel, and the object motions at that pixel,: b) data driven approaches to parameterize breath motion and reconstruct lung CT data volumes, and: c) defining and offering initial results for a new class of Partially Unsupervised Manifold Learning: PUML) problems, which often arise in medical imagery. Specifically, we create energy functions for measuring how consistent a given velocity vector is with observed spatio-temporal image derivatives. These energy functions are used to fit parametric snake models to roads using velocity constraints. We create an automatic data-driven technique for finding the breath phase of lung CT scans which is able to replace external belt measurements currently in use clinically. This approach is extended to automatically create a full deformation model of a CT lung volume during breathing or heart MRI during breathing and heartbeat. Additionally, motivated by real use cases, we address a scenario in which a dataset is collected along with meta-data which describes some, but not all, aspects of the dataset. We create an embedding which displays the remaining variability in a dataset after accounting for variability related to the meta-data

    Doctor of Philosophy

    Get PDF
    dissertationWith the ever-increasing amount of available computing resources and sensing devices, a wide variety of high-dimensional datasets are being produced in numerous fields. The complexity and increasing popularity of these data have led to new challenges and opportunities in visualization. Since most display devices are limited to communication through two-dimensional (2D) images, many visualization methods rely on 2D projections to express high-dimensional information. Such a reduction of dimension leads to an explosion in the number of 2D representations required to visualize high-dimensional spaces, each giving a glimpse of the high-dimensional information. As a result, one of the most important challenges in visualizing high-dimensional datasets is the automatic filtration and summarization of the large exploration space consisting of all 2D projections. In this dissertation, a new type of algorithm is introduced to reduce the exploration space that identifies a small set of projections that capture the intrinsic structure of high-dimensional data. In addition, a general framework for summarizing the structure of quality measures in the space of all linear 2D projections is presented. However, identifying the representative or informative projections is only part of the challenge. Due to the high-dimensional nature of these datasets, obtaining insights and arriving at conclusions based solely on 2D representations are limited and prone to error. How to interpret the inaccuracies and resolve the ambiguity in the 2D projections is the other half of the puzzle. This dissertation introduces projection distortion error measures and interactive manipulation schemes that allow the understanding of high-dimensional structures via data manipulation in 2D projections

    Machine Intelligence for Advanced Medical Data Analysis: Manifold Learning Approach

    Get PDF
    In the current work, linear and non-linear manifold learning techniques, specifically Principle Component Analysis (PCA) and Laplacian Eigenmaps, are studied in detail. Their applications in medical image and shape analysis are investigated. In the first contribution, a manifold learning-based multi-modal image registration technique is developed, which results in a unified intensity system through intensity transformation between the reference and sensed images. The transformation eliminates intensity variations in multi-modal medical scans and hence facilitates employing well-studied mono-modal registration techniques. The method can be used for registering multi-modal images with full and partial data. Next, a manifold learning-based scale invariant global shape descriptor is introduced. The proposed descriptor benefits from the capability of Laplacian Eigenmap in dealing with high dimensional data by introducing an exponential weighting scheme. It eliminates the limitations tied to the well-known cotangent weighting scheme, namely dependency on triangular mesh representation and high intra-class quality of 3D models. In the end, a novel descriptive model for diagnostic classification of pulmonary nodules is presented. The descriptive model benefits from structural differences between benign and malignant nodules for automatic and accurate prediction of a candidate nodule. It extracts concise and discriminative features automatically from the 3D surface structure of a nodule using spectral features studied in the previous work combined with a point cloud-based deep learning network. Extensive experiments have been conducted and have shown that the proposed algorithms based on manifold learning outperform several state-of-the-art methods. Advanced computational techniques with a combination of manifold learning and deep networks can play a vital role in effective healthcare delivery by providing a framework for several fundamental tasks in image and shape processing, namely, registration, classification, and detection of features of interest

    Node Embedding from Neural Hamiltonian Orbits in Graph Neural Networks

    Full text link
    In the graph node embedding problem, embedding spaces can vary significantly for different data types, leading to the need for different GNN model types. In this paper, we model the embedding update of a node feature as a Hamiltonian orbit over time. Since the Hamiltonian orbits generalize the exponential maps, this approach allows us to learn the underlying manifold of the graph in training, in contrast to most of the existing literature that assumes a fixed graph embedding manifold with a closed exponential map solution. Our proposed node embedding strategy can automatically learn, without extensive tuning, the underlying geometry of any given graph dataset even if it has diverse geometries. We test Hamiltonian functions of different forms and verify the performance of our approach on two graph node embedding downstream tasks: node classification and link prediction. Numerical experiments demonstrate that our approach adapts better to different types of graph datasets than popular state-of-the-art graph node embedding GNNs. The code is available at \url{https://github.com/zknus/Hamiltonian-GNN}
    • …
    corecore