62 research outputs found

    Symmetric Subspace Learning for Image Analysis

    Get PDF

    Graph Embedded Extreme Learning Machine

    Get PDF

    Linear dimensionality reduction: Survey, insights, and generalizations

    Get PDF
    Linear dimensionality reduction methods are a cornerstone of analyzing high dimensional data, due to their simple geometric interpretations and typically attractive computational properties. These methods capture many data features of interest, such as covariance, dynamical structure, correlation between data sets, input-output relationships, and margin between data classes. Methods have been developed with a variety of names and motivations in many fields, and perhaps as a result the connections between all these methods have not been highlighted. Here we survey methods from this disparate literature as optimization programs over matrix manifolds. We discuss principal component analysis, factor analysis, linear multidimensional scaling, Fisher's linear discriminant analysis, canonical correlations analysis, maximum autocorrelation factors, slow feature analysis, sufficient dimensionality reduction, undercomplete independent component analysis, linear regression, distance metric learning, and more. This optimization framework gives insight to some rarely discussed shortcomings of well-known methods, such as the suboptimality of certain eigenvector solutions. Modern techniques for optimization over matrix manifolds enable a generic linear dimensionality reduction solver, which accepts as input data and an objective to be optimized, and returns, as output, an optimal low-dimensional projection of the data. This simple optimization framework further allows straightforward generalizations and novel variants of classical methods, which we demonstrate here by creating an orthogonal-projection canonical correlations analysis. More broadly, this survey and generic solver suggest that linear dimensionality reduction can move toward becoming a blackbox, objective-agnostic numerical technology.JPC and ZG received funding from the UK Engineering and Physical Sciences Research Council (EPSRC EP/H019472/1). JPC received funding from a Sloan Research Fellowship, the Simons Foundation (SCGB#325171 and SCGB#325233), the Grossman Center at Columbia University, and the Gatsby Charitable Trust.This is the author accepted manuscript. The final version is available from MIT Press via http://jmlr.org/papers/v16/cunningham15a.htm

    Computational Methods for Support Vector Machine Classification and Large-Scale Kalman Filtering

    Get PDF
    The first half of this dissertation focuses on computational methods for solving the constrained quadratic program (QP) within the support vector machine (SVM) classifier. One of the SVM formulations requires the solution of bound and equality constrained QPs. We begin by describing an augmented Lagrangian approach which incorporates the equality constraint into the objective function, resulting in a bound constrained QP. Furthermore, all constraints may be incorporated into the objective function to yield an unconstrained quadratic program, allowing us to apply the conjugate gradient (CG) method. Lastly, we adapt the scaled gradient projection method of [10] to the SVM QP and compare the performance of these methods with the state-of-the-art sequential minimal optimization algorithm and MATLAB\u27s built in constrained QP solver, quadprog. The augmented Lagrangian method outperforms other state-of-the-art methods on three image test cases. The second half of this dissertation focuses on computational methods for large-scale Kalman filtering applications. The Kalman filter (KF) is a method for solving a dynamic, coupled system of equations. While these methods require only linear algebra, standard KF is often infeasible in large-scale implementations due to the storage requirements and inverse calculations of large, dense covariance matrices. We introduce the use of the CG and Lanczos methods into various forms of the Kalman filter for low-rank approximations of the covariance matrices, with low-storage requirements. We also use CG for efficient Gaussian sampling within the ensemble Kalman filter method. The CG-based KF methods perform similarly in root-mean-square error when compared to the standard KF methods, when the standard implementations are feasible, and outperform the limited-memory Broyden-Fletcher-Goldfarb-Shanno approximation method

    A mathematical theory of making hard decisions: model selection and robustness of matrix factorization with binary constraints

    Get PDF
    One of the first and most fundamental tasks in machine learning is to group observations within a dataset. Given a notion of similarity, finding those instances which are outstandingly similar to each other has manifold applications. Recommender systems and topic analysis in text data are examples which are most intuitive to grasp. The interpretation of the groups, called clusters, is facilitated if the assignment of samples is definite. Especially in high-dimensional data, denoting a degree to which an observation belongs to a specified cluster requires a subsequent processing of the model to filter the most important information. We argue that a good summary of the data provides hard decisions on the following question: how many groups are there, and which observations belong to which clusters? In this work, we contribute to the theoretical and practical background of clustering tasks, addressing one or both aspects of this question. Our overview of state-of-the-art clustering approaches details the challenges of our ambition to provide hard decisions. Based on this overview, we develop new methodologies for two branches of clustering: the one concerns the derivation of nonconvex clusters, known as spectral clustering; the other addresses the identification of biclusters, a set of samples together with similarity defining features, via Boolean matrix factorization. One of the main challenges in both considered settings is the robustness to noise. Assuming that the issue of robustness is controllable by means of theoretical insights, we have a closer look at those aspects of established clustering methods which lack a theoretical foundation. In the scope of Boolean matrix factorization, we propose a versatile framework for the optimization of matrix factorizations subject to binary constraints. Especially Boolean factorizations have been computed by intuitive methods so far, implementing greedy heuristics which lack quality guarantees of obtained solutions. In contrast, we propose to build upon recent advances in nonconvex optimization theory. This enables us to provide convergence guarantees to local optima of a relaxed objective, requiring only approximately binary factor matrices. By means of this new optimization scheme PAL-Tiling, we propose two approaches to automatically determine the number of clusters. The one is based on information theory, employing the minimum description length principle, and the other is a novel statistical approach, controlling the false discovery rate. The flexibility of our framework PAL-Tiling enables the optimization of novel factorization schemes. In a different context, where every data point belongs to a pre-defined class, a characterization of the classes may be obtained by Boolean factorizations. However, there are cases where this traditional factorization scheme is not sufficient. Therefore, we propose the integration of another factor matrix, reflecting class-specific differences within a cluster. Our theoretical considerations are complemented by empirical evaluations, showing how our methods combine theoretical soundness with practical advantages