244 research outputs found

    Four algorithms to solve symmetric multi-type non-negative matrix tri-factorization problem

    Get PDF
    In this paper, we consider the symmetric multi-type non-negative matrix tri-factorization problem (SNMTF), which attempts to factorize several symmetric non-negative matrices simultaneously. This can be considered as a generalization of the classical non-negative matrix tri-factorization problem and includes a non-convex objective function which is a multivariate sixth degree polynomial and a has convex feasibility set. It has a special importance in data science, since it serves as a mathematical model for the fusion of different data sources in data clustering. We develop four methods to solve the SNMTF. They are based on four theoretical approaches known from the literature: the fixed point method (FPM), the block-coordinate descent with projected gradient (BCD), the gradient method with exact line search (GM-ELS) and the adaptive moment estimation method (ADAM). For each of these methods we offer a software implementation: for the former two methods we use Matlab and for the latter Python with the TensorFlow library. We test these methods on three data-sets: the synthetic data-set we generated, while the others represent real-life similarities between different objects. Extensive numerical results show that with sufficient computing time all four methods perform satisfactorily and ADAM most often yields the best mean square error (MSE\mathrm{MSE}). However, if the computation time is limited, FPM gives the best MSE\mathrm{MSE} because it shows the fastest convergence at the beginning. All data-sets and codes are publicly available on our GitLab profile

    Single-channel source separation using non-negative matrix factorization

    Get PDF

    A mathematical theory of making hard decisions: model selection and robustness of matrix factorization with binary constraints

    Get PDF
    One of the first and most fundamental tasks in machine learning is to group observations within a dataset. Given a notion of similarity, finding those instances which are outstandingly similar to each other has manifold applications. Recommender systems and topic analysis in text data are examples which are most intuitive to grasp. The interpretation of the groups, called clusters, is facilitated if the assignment of samples is definite. Especially in high-dimensional data, denoting a degree to which an observation belongs to a specified cluster requires a subsequent processing of the model to filter the most important information. We argue that a good summary of the data provides hard decisions on the following question: how many groups are there, and which observations belong to which clusters? In this work, we contribute to the theoretical and practical background of clustering tasks, addressing one or both aspects of this question. Our overview of state-of-the-art clustering approaches details the challenges of our ambition to provide hard decisions. Based on this overview, we develop new methodologies for two branches of clustering: the one concerns the derivation of nonconvex clusters, known as spectral clustering; the other addresses the identification of biclusters, a set of samples together with similarity defining features, via Boolean matrix factorization. One of the main challenges in both considered settings is the robustness to noise. Assuming that the issue of robustness is controllable by means of theoretical insights, we have a closer look at those aspects of established clustering methods which lack a theoretical foundation. In the scope of Boolean matrix factorization, we propose a versatile framework for the optimization of matrix factorizations subject to binary constraints. Especially Boolean factorizations have been computed by intuitive methods so far, implementing greedy heuristics which lack quality guarantees of obtained solutions. In contrast, we propose to build upon recent advances in nonconvex optimization theory. This enables us to provide convergence guarantees to local optima of a relaxed objective, requiring only approximately binary factor matrices. By means of this new optimization scheme PAL-Tiling, we propose two approaches to automatically determine the number of clusters. The one is based on information theory, employing the minimum description length principle, and the other is a novel statistical approach, controlling the false discovery rate. The flexibility of our framework PAL-Tiling enables the optimization of novel factorization schemes. In a different context, where every data point belongs to a pre-defined class, a characterization of the classes may be obtained by Boolean factorizations. However, there are cases where this traditional factorization scheme is not sufficient. Therefore, we propose the integration of another factor matrix, reflecting class-specific differences within a cluster. Our theoretical considerations are complemented by empirical evaluations, showing how our methods combine theoretical soundness with practical advantages

    Tensor Analysis and Fusion of Multimodal Brain Images

    Get PDF
    Current high-throughput data acquisition technologies probe dynamical systems with different imaging modalities, generating massive data sets at different spatial and temporal resolutions posing challenging problems in multimodal data fusion. A case in point is the attempt to parse out the brain structures and networks that underpin human cognitive processes by analysis of different neuroimaging modalities (functional MRI, EEG, NIRS etc.). We emphasize that the multimodal, multi-scale nature of neuroimaging data is well reflected by a multi-way (tensor) structure where the underlying processes can be summarized by a relatively small number of components or "atoms". We introduce Markov-Penrose diagrams - an integration of Bayesian DAG and tensor network notation in order to analyze these models. These diagrams not only clarify matrix and tensor EEG and fMRI time/frequency analysis and inverse problems, but also help understand multimodal fusion via Multiway Partial Least Squares and Coupled Matrix-Tensor Factorization. We show here, for the first time, that Granger causal analysis of brain networks is a tensor regression problem, thus allowing the atomic decomposition of brain networks. Analysis of EEG and fMRI recordings shows the potential of the methods and suggests their use in other scientific domains.Comment: 23 pages, 15 figures, submitted to Proceedings of the IEE

    Non-negative Matrix factorization:Theory and Methods

    Get PDF

    강인한 저차원 공간의 학습과 분류: 희소 및 저계수 표현

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2017. 2. 오성회.Learning a subspace structure based on sparse or low-rank representation has gained much attention and has been widely used over the past decade in machine learning, signal processing, computer vision, and robotic literatures to model a wide range of natural phenomena. Sparse representation is a powerful tool for high-dimensional data such as images, where the goal is to represent or compress the cumbersome data using a few representative samples. Low-rank representation is a generalization of the sparse representation in 2D space. Behind the successful outcomes, many efforts have been made for learning sparse or low-rank representation effciently. However, they are still ineffcient for complex data structures and lack robustness under the existence of various noises including outliers and missing data, because many existing algorithms relax the ideal optimization problem to a tractable one without considering computational and memory complexities. Thus, it is important to use a good representation algorithm which is effciently solvable and robust against unwanted corruptions. In this dissertation, our main goal is to learn algorithms with both robustness and effciency under noisy environments. As for sparse representation, most of the optimization problems are relaxed to convex ones based on surrogate measures, such as the l1-norm, to resolve the computational intractability and high noise sensitivity of the original sparse representation problem based on the l0-norm. However, if the system at interest, other than the sparsity measure, is inherently nonconvex, then using a convex sparsity measure may not be the best choice for the problems. From this perspective, we propose desirable criteria to be a good nonconvex sparsity measure and suggest a corresponding family of measure. The proposed family of measures allows a simple measure, which enables effcient computation and embraces the benefits of both l0- and l1-norms, and most importantly, its gradient vanishes slowly unlike the l0-norm, which is suitable from an optimization perspective. For low-rank representation, we first present an effcient l1-norm based low-rank matrix approximation algorithm using the proposed alternating rectified gradient methods to solve an l1-norm minimization problem, since conventional algorithms are very slow to solve the l1-norm based alternating minimization problem. The proposed methods try to find an optimal direction with a proper constraint which limits the search domain to avoid the diffculty that arises from the ambiguity in representing the two optimization variables. It is extended to an algorithm with an explicit smoothness regularizer and an orthogonality constraint for better effciency and solve it under the augmented Lagrangian framework. To give more stable solution with flexible rank estimation in the presence of heavy corruptions, we present a new solution based on the elastic-net regularization of singular values, which allows a faster algorithm than existing rank minimization methods without any heavy operations and is more stable than the state-of-the-art low-rank approximation algorithms due to its strong convexity. As a result, the proposed method leads to a holistic approach which enables both rank minimization and bilinear factorization. Moreover, as an extension to the previous methods performing on an unstructured matrix, we apply recent advances in rank minimization to a structured matrix for robust kernel subspace estimation under noisy scenarios. Lastly, but not least, we extend a low-rank approximation problem, which assumes a single subspace, to a problem which lies in a union of multiple subspaces, which is closely related to subspace clustering. While many recent studies are based on sparse or low-rank representation, the grouping effect among similar samples has not been often considered with the sparse or low-rank representation. Thus, we propose a robust group subspace clustering lgorithms based on sparse and low-rank representation with explicit subspace grouping. To resolve the fundamental issue on computational complexity of existing subspace clustering algorithms, we suggest a full scalable low-rank subspace clustering approach, which achieves linear complexity in the number of samples. Extensive experimental results on various applications, including computer vision and robotics, using benchmark and real-world data sets verify that our suggested solutions to the existing issues on sparse and low-rank representations are considerably robust, effective, and practically applicable.1 Introduction 1 1.1 Main Challenges 4 1.2 Organization of the Dissertation 6 2 Related Work 11 2.1 Sparse Representation 11 2.2 Low-Rank Representation 14 2.2.1 Low-rank matrix approximation 14 2.2.2 Robust principal component analysis 17 2.3 Subspace Clustering 18 2.3.1 Sparse subspace clustering 18 2.3.2 Low-rank subspace clustering 20 2.3.3 Scalable subspace clustering 20 2.4 Gaussian Process Regression 21 3 Effcient Nonconvex Sparse Representation 25 3.1 Analysis of the l0-norm approximation 26 3.1.1 Notations 26 3.1.2 Desirable criteria for a nonconvex measure 27 3.1.3 A representative family of measures: SVG 29 3.2 The Proposed Nonconvex Sparsity Measure 32 3.2.1 Choosing a simple one among the SVG family 32 3.2.2 Relationships with other sparsity measures 34 3.2.3 More analysis on SVG 36 3.2.4 Learning sparse representations via SVG 38 3.3 Experimental Results 40 3.3.1 Evaluation for nonconvex sparsity measures 41 3.3.2 Low-rank approximation of matrices 42 3.3.3 Sparse coding 44 3.3.4 Subspace clustering 46 3.3.5 Parameter Analysis 49 3.4 Summary 51 4 Robust Fixed Low-Rank Representations 53 4.1 The Alternating Rectified Gradient Method for l1 Minimization 54 4.1.1 l1-ARGA as an approximation method 54 4.1.2 l1-ARGD as a dual method 65 4.1.3 Experimental results 74 4.2 Smooth Regularized Fixed-Rank Representation 88 4.2.1 Robust orthogonal matrix factorization (ROMF) 89 4.2.2 Rank estimation for ROMF (ROMF-RE) 95 4.2.3 Experimental results 98 4.3 Structured Low-Rank Representation 114 4.3.1 Kernel subspace learning 115 4.3.2 Structured kernel subspace learning in GPR 119 4.3.3 Experimental results 125 4.4 Summary 133 5 Robust Lower-Rank Subspace Representations 135 5.1 Elastic-Net Subspace Representation 136 5.2 Robust Elastic-Net Subspace Learning 140 5.2.1 Problem formulation 140 5.2.2 Algorithm: FactEN 145 5.3 Joint Subspace Estimation and Clustering 151 5.3.1 Problem formulation 151 5.3.2 Algorithm: ClustEN 152 5.4 Experiments 156 5.4.1 Subspace learning problems 157 5.4.2 Subspace clustering problems 167 5.5 Summary 174 6 Robust Group Subspace Representations 175 6.1 Group Subspace Representation 176 6.2 Group Sparse Representation (GSR) 180 6.2.1 GSR with noisy data 180 6.2.2 GSR with corrupted data 181 6.3 Group Low-Rank Representation (GLR) 184 6.3.1 GLR with noisy or corrupted data 184 6.4 Experimental Results 187 6.5 Summary 197 7 Scalable Low-Rank Subspace Clustering 199 7.1 Incremental Affnity Representation 201 7.2 End-to-End Scalable Subspace Clustering 205 7.2.1 Robust incremental summary representation 205 7.2.2 Effcient affnity construction 207 7.2.3 An end-to-end scalable learning pipeline 210 7.2.4 Nonlinear extension for SLR 213 7.3 Experimental Results 215 7.3.1 Synthetic data 216 7.3.2 Motion segmentation 219 7.3.3 Face clustering 220 7.3.4 Handwritten digits clustering 222 7.3.5 Action clustering 224 7.4 Summary 227 8 Conclusion and Future Work 229 Appendices 233 A Derivations of the LRA Problems 235 B Proof of Lemma 1 237 C Proof of Proposition 1 239 D Proof of Theorem 1 241 E Proof of Theorem 2 247 F Proof of Theorems in Chapter 6 251 F.1 Proof of Theorem 3 251 F.2 Proof of Theorem 4 252 F.3 Proof of Theorem 5 253 G Proof of Theorems in Chapter 7 255 G.1 Proof of Theorem 6 255 G.2 Proof of Theorem 7 256 Bibliography 259 초록 275Docto

    Riemannian Optimization for Convex and Non-Convex Signal Processing and Machine Learning Applications

    Get PDF
    The performance of most algorithms for signal processing and machine learning applications highly depends on the underlying optimization algorithms. Multiple techniques have been proposed for solving convex and non-convex problems such as interior-point methods and semidefinite programming. However, it is well known that these algorithms are not ideally suited for large-scale optimization with a high number of variables and/or constraints. This thesis exploits a novel optimization method, known as Riemannian optimization, for efficiently solving convex and non-convex problems with signal processing and machine learning applications. Unlike most optimization techniques whose complexities increase with the number of constraints, Riemannian methods smartly exploit the structure of the search space, a.k.a., the set of feasible solutions, to reduce the embedded dimension and efficiently solve optimization problems in a reasonable time. However, such efficiency comes at the expense of universality as the geometry of each manifold needs to be investigated individually. This thesis explains the steps of designing first and second-order Riemannian optimization methods for smooth matrix manifolds through the study and design of optimization algorithms for various applications. In particular, the paper is interested in contemporary applications in signal processing and machine learning, such as community detection, graph-based clustering, phase retrieval, and indoor and outdoor location determination. Simulation results are provided to attest to the efficiency of the proposed methods against popular generic and specialized solvers for each of the above applications
    corecore