369 research outputs found
From-Below Boolean Matrix Factorization Algorithm Based on MDL
During the past few years Boolean matrix factorization (BMF) has become an
important direction in data analysis. The minimum description length principle
(MDL) was successfully adapted in BMF for the model order selection.
Nevertheless, a BMF algorithm performing good results from the standpoint of
standard measures in BMF is missing. In this paper, we propose a novel
from-below Boolean matrix factorization algorithm based on formal concept
analysis. The algorithm utilizes the MDL principle as a criterion for the
factor selection. On various experiments we show that the proposed algorithm
outperforms---from different standpoints---existing state-of-the-art BMF
algorithms
PANTHER: Pathway Augmented Nonnegative Tensor factorization for HighER-order feature learning
Genetic pathways usually encode molecular mechanisms that can inform targeted
interventions. It is often challenging for existing machine learning approaches
to jointly model genetic pathways (higher-order features) and variants (atomic
features), and present to clinicians interpretable models. In order to build
more accurate and better interpretable machine learning models for genetic
medicine, we introduce Pathway Augmented Nonnegative Tensor factorization for
HighER-order feature learning (PANTHER). PANTHER selects informative genetic
pathways that directly encode molecular mechanisms. We apply genetically
motivated constrained tensor factorization to group pathways in a way that
reflects molecular mechanism interactions. We then train a softmax classifier
for disease types using the identified pathway groups. We evaluated PANTHER
against multiple state-of-the-art constrained tensor/matrix factorization
models, as well as group guided and Bayesian hierarchical models. PANTHER
outperforms all state-of-the-art comparison models significantly (p<0.05). Our
experiments on large scale Next Generation Sequencing (NGS) and whole-genome
genotyping datasets also demonstrated wide applicability of PANTHER. We
performed feature analysis in predicting disease types, which suggested
insights and benefits of the identified pathway groups.Comment: Accepted by 35th AAAI Conference on Artificial Intelligence (AAAI
2021
A mathematical theory of making hard decisions: model selection and robustness of matrix factorization with binary constraints
One of the first and most fundamental tasks in machine learning is to group observations within a dataset. Given a notion of similarity, finding those instances which are outstandingly similar to each other has manifold applications. Recommender systems and topic analysis in text data are examples which are most intuitive to grasp. The interpretation of the groups, called clusters, is facilitated if the assignment of samples is definite. Especially in high-dimensional data, denoting a degree to which an observation belongs to a specified cluster requires a subsequent processing of the model to filter the most important information. We argue that a good summary of the data provides hard decisions on the following question: how many groups are there, and which observations belong to which clusters? In this work, we contribute to the theoretical and practical background of clustering tasks, addressing one or both aspects of this question. Our overview of state-of-the-art clustering approaches details the challenges of our ambition to provide hard decisions. Based on this overview, we develop new methodologies for two branches of clustering: the one concerns the derivation of nonconvex clusters, known as spectral clustering; the other addresses the identification of biclusters, a set of samples together with similarity defining features, via Boolean matrix factorization. One of the main challenges in both considered settings is the robustness to noise. Assuming that the issue of robustness is controllable by means of theoretical insights, we have a closer look at those aspects of established clustering methods which lack a theoretical foundation. In the scope of Boolean matrix factorization, we propose a versatile framework for the optimization of matrix factorizations subject to binary constraints. Especially Boolean factorizations have been computed by intuitive methods so far, implementing greedy heuristics which lack quality guarantees of obtained solutions. In contrast, we propose to build upon recent advances in nonconvex optimization theory. This enables us to provide convergence guarantees to local optima of a relaxed objective, requiring only approximately binary factor matrices. By means of this new optimization scheme PAL-Tiling, we propose two approaches to automatically determine the number of clusters. The one is based on information theory, employing the minimum description length principle, and the other is a novel statistical approach, controlling the false discovery rate. The flexibility of our framework PAL-Tiling enables the optimization of novel factorization schemes. In a different context, where every data point belongs to a pre-defined class, a characterization of the classes may be obtained by Boolean factorizations. However, there are cases where this traditional factorization scheme is not sufficient. Therefore, we propose the integration of another factor matrix, reflecting class-specific differences within a cluster. Our theoretical considerations are complemented by empirical evaluations, showing how our methods combine theoretical soundness with practical advantages
A Study of Boolean Matrix Factorization Under Supervised Settings
International audienceBoolean matrix factorization is a generally accepted approach used in data analysis to explain data. It is commonly used under unsu-pervised setting or for data preprocessing under supervised settings. In this paper we study factors under supervised settings. We provide an experimental proof that factors are able to explain not only data as a whole but also classes in the data
Department of Applied Mathematics Academic Program Review, Self Study / June 2010
The Department of Applied Mathematics has a multi-faceted mission to provide an exceptional mathematical education focused on the unique needs of NPS students, to conduct relevant research, and to provide service to the broader community. A strong and vibrant Department of Applied Mathematics is essential to the university's goal of becoming a premiere research university. Because research in mathematics often impacts science and engineering in surprising ways, the department encourages mathematical explorations in a broad range of areas in applied mathematics with specific thrust areas that support the mission of the school
The Singular Value Decomposition over Completed Idempotent Semifields
In this paper, we provide a basic technique for Lattice Computing: an analogue of the Singular Value Decomposition for rectangular matrices over complete idempotent semifields (i-SVD). These algebras are already complete lattices and many of their instances—the complete schedule algebra or completed max-plus semifield, the tropical algebra, and the max-times algebra—are useful in a range of applications, e.g., morphological processing. We further the task of eliciting the relation between i-SVD and the extension of Formal Concept Analysis to complete idempotent semifields (K-FCA) started in a prior work. We find out that for a matrix with entries considered in a complete idempotent semifield, the Galois connection at the heart of K-FCA provides two basis of left- and right-singular vectors to choose from, for reconstructing the matrix. These are join-dense or meet-dense sets of object or attribute concepts of the concept lattice created by the connection, and they are almost surely not pairwise orthogonal. We conclude with an attempt analogue of the fundamental theorem of linear algebra that gathers all results and discuss it in the wider setting of matrix factorization.This research was funded by the Spanish Government-MinECo project TEC2017-84395-P and the Dept. of Research and Innovation of Madrid Regional Authority project EMPATIA-CM (Y2018/TCS-5046)
- …