Search CORE

32 research outputs found

Development of effective gene selection algorithms for microarray data analysis

Author: Balasubramanian Krishnakumar
Kim Jingu
Kim Wooyoung
Park Haesun
Publication venue: Georgia Institute of Technology
Publication date: 03/05/2011
Field of study

Issued as final reportNational Science Foundation (U.S.

Scholarly Materials And Research @ Georgia Tech

Fast Newton-type Methods for the Least Squares Nonnegative Matrix Approximation Problem

Author
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date
Field of study

Crossref

Fast Nonnegative Matrix Factorization Algorithms Using Projected Gradient Approaches for Large-Scale Problems

Author: Cichocki Andrzej
Zdunek Rafal
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2008
Field of study

Recently, a considerable growth of interest in projected gradient (PG) methods has been observed due to their high efficiency in solving large-scale convex minimization problems subject to linear constraints. Since the minimization problems underlying nonnegative matrix factorization (NMF) of large matrices well matches this class of minimization problems, we investigate and test some recent PG methods in the context of their applicability to NMF. In particular, the paper focuses on the following modified methods: projected Landweber, Barzilai-Borwein gradient projection, projected sequential subspace optimization (PSESOP), interior-point Newton (IPN), and sequential coordinate-wise. The proposed and implemented NMF PG algorithms are compared with respect to their performance in terms of signal-to-interference ratio (SIR) and elapsed time, using a simple benchmark of mixed partially dependent nonnegative signals

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Nonnegative matrix factorization for clustering

Author: Kuang Da
Publication venue: Georgia Institute of Technology
Publication date: 27/08/2014
Field of study

This dissertation shows that nonnegative matrix factorization (NMF) can be extended to a general and efficient clustering method. Clustering is one of the fundamental tasks in machine learning. It is useful for unsupervised knowledge discovery in a variety of applications such as text mining and genomic analysis. NMF is a dimension reduction method that approximates a nonnegative matrix by the product of two lower rank nonnegative matrices, and has shown great promise as a clustering method when a data set is represented as a nonnegative data matrix. However, challenges in the widespread use of NMF as a clustering method lie in its correctness and efficiency: First, we need to know why and when NMF could detect the true clusters and guarantee to deliver good clustering quality; second, existing algorithms for computing NMF are expensive and often take longer time than other clustering methods. We show that the original NMF can be improved from both aspects in the context of clustering. Our new NMF-based clustering methods can achieve better clustering quality and run orders of magnitude faster than the original NMF and other clustering methods. Like other clustering methods, NMF places an implicit assumption on the cluster structure. Thus, the success of NMF as a clustering method depends on whether the representation of data in a vector space satisfies that assumption. Our approach to extending the original NMF to a general clustering method is to switch from the vector space representation of data points to a graph representation. The new formulation, called Symmetric NMF, takes a pairwise similarity matrix as an input and can be viewed as a graph clustering method. We evaluate this method on document clustering and image segmentation problems and find that it achieves better clustering accuracy. In addition, for the original NMF, it is difficult but important to choose the right number of clusters. We show that the widely-used consensus NMF in genomic analysis for choosing the number of clusters have critical flaws and can produce misleading results. We propose a variation of the prediction strength measure arising from statistical inference to evaluate the stability of clusters and select the right number of clusters. Our measure shows promising performances in artificial simulation experiments. Large-scale applications bring substantial efficiency challenges to existing algorithms for computing NMF. An important example is topic modeling where users want to uncover the major themes in a large text collection. Our strategy of accelerating NMF-based clustering is to design algorithms that better suit the computer architecture as well as exploit the computing power of parallel platforms such as the graphic processing units (GPUs). A key observation is that applying rank-2 NMF that partitions a data set into two clusters in a recursive manner is much faster than applying the original NMF to obtain a flat clustering. We take advantage of a special property of rank-2 NMF and design an algorithm that runs faster than existing algorithms due to continuous memory access. Combined with a criterion to stop the recursion, our hierarchical clustering algorithm runs significantly faster and achieves even better clustering quality than existing methods. Another bottleneck of NMF algorithms, which is also a common bottleneck in many other machine learning applications, is to multiply a large sparse data matrix with a tall-and-skinny dense matrix. We use the GPUs to accelerate this routine for sparse matrices with an irregular sparsity structure. Overall, our algorithm shows significant improvement over popular topic modeling methods such as latent Dirichlet allocation, and runs more than 100 times faster on data sets with millions of documents.Ph.D

Scholarly Materials And Research @ Georgia Tech

Single-channel source separation using non-negative matrix factorization

Author: Schmidt Mikkel Nørgaard
Publication venue: Technical University of Denmark, DTU Informatics, Building 321
Publication date: 01/01/2009
Field of study

Online Research Database In Technology

Learning without labels and nonnegative tensor factorization

Author: Balasubramanian Krishnakumar
Publication venue: Georgia Institute of Technology
Publication date: 08/04/2010
Field of study

Supervised learning tasks like building a classifier, estimating the error rate of the predictors, are typically performed with labeled data. In most cases, obtaining labeled data is costly as it requires manual labeling. On the other hand, unlabeled data is available in abundance. In this thesis, we discuss methods to perform supervised learning tasks with no labeled data. We prove consistency of the proposed methods and demonstrate its applicability with synthetic and real world experiments. In some cases, small quantities of labeled data maybe easily available and supplemented with large quantities of unlabeled data (semi-supervised learning). We derive the asymptotic efficiency of generative models for semi-supervised learning and quantify the effect of labeled and unlabeled data on the quality of the estimate. Another independent track of the thesis is efficient computational methods for nonnegative tensor factorization (NTF). NTF provides the user with rich modeling capabilities but it comes with an added computational cost. We provide a fast algorithm for performing NTF using a modified active set method called block principle pivoting method and demonstrate its applicability to social network analysis and text mining.M.S.Committee Chair: Lebanon, Guy; Committee Co-Chair: Park, Haesun; Committee Member: Gray, Alexande

Scholarly Materials And Research @ Georgia Tech

Sparse feature learning for image analysis in segmentation, classification, and disease diagnosis.

Author: Hosseini-Asl Ehsan
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/05/2016
Field of study

The success of machine learning algorithms generally depends on intermediate data representation, called features that disentangle the hidden factors of variation in data. Moreover, machine learning models are required to be generalized, in order to reduce the specificity or bias toward the training dataset. Unsupervised feature learning is useful in taking advantage of large amount of unlabeled data, which is available to capture these variations. However, learned features are required to capture variational patterns in data space. In this dissertation, unsupervised feature learning with sparsity is investigated for sparse and local feature extraction with application to lung segmentation, interpretable deep models, and Alzheimer\u27s disease classification. Nonnegative Matrix Factorization, Autoencoder and 3D Convolutional Autoencoder are used as architectures or models for unsupervised feature learning. They are investigated along with nonnegativity, sparsity and part-based representation constraints for generalized and transferable feature extraction

University of Louisville