554 research outputs found

    Evolutionary nonnegative matrix factorization for data compression

    Get PDF
    This paper aims at improving non-negative matrix factor- ization (NMF) to facilitate data compression. An evolutionary updat- ing strategy is proposed to solve the NMF problem iteratively based on three sets of updating rules including multiplicative, firefly and sur- vival of the fittest rules. For data compression application, the quality of the factorized matrices can be evaluated by measurements such as spar- sity, orthogonality and factorization error to assess compression quality in terms of storage space consumption, redundancy in data matrix and data approximation accuracy. Thus, the fitness score function that drives the evolving procedure is designed as a composite score that takes into account all these measurements. A hybrid initialization scheme is per- formed to improve the rate of convergence, allowing multiple initial can- didates generated by different types of NMF initialization approaches. Effectiveness of the proposed method is demonstrated using Yale and ORL image datasets

    Evolutionary nonnegative matrix factorization with adaptive control of cluster quality

    Get PDF
    Nonnegative matrix factorization (NMF) approximates a given data matrix using linear combinations of a small number of nonnegative basis vectors, weighted by nonnegative encoding coefficients. This enables the exploration of the cluster structure of the data through the examination of the values of the encoding coefficients and therefore, NMF is often used as a popular tool for clustering analysis. However, its encoding coefficients do not always reveal a satisfactory cluster structure. To improve its effectiveness, a novel evolutionary strategy is proposed here to drive the iterative updating scheme of NMF and generate encoding coefficients of higher quality that are capable of offering more accurate and sharper cluster structures. The proposed hybridization procedure that relies on multiple initializations reinforces the robustness of the solution. Additionally, three evolving rules are designed to simultaneously boost the cluster quality and the reconstruction error during the iterative updates. Any clustering performance measure, such as either an internal one relying on the data itself or an external based on the availability of ground truth information, can be employed to drive the evolving procedure. The effectiveness of the proposed method is demonstrated via careful experimental designs and thorough comparative analyses using multiple benchmark datasets

    libNMF -- A Library for Nonnegative Matrix Factorization

    Get PDF
    We present libNMF -- a computationally efficient high performance library for computing nonnegative matrix factorizations (NMF) written in C. Various algorithms and algorithmic variants for computing NMF are supported. libNMF is based on external routines from BLAS (Basic Linear Algebra Subprograms), LAPack (Linear Algebra package) and ARPack, which provide efficient building blocks for performing central vector and matrix operations. Since modern BLAS implementations support multi-threading, libNMF can exploit the potential of multi-core architectures. In this paper, the basic NMF algorithms contained in libNMF and existing implementations found in the literature are briefly reviewed. Then, libNMF is evaluated in terms of computational efficiency and numerical accuracy and compared with the best existing codes available. libNMF is publicly available at http://rlcta.univie.ac.at/software

    Nonnegative matrix analysis for data clustering and compression

    Get PDF
    Nonnegative matrix factorization (NMF) has becoming an increasingly popular data processing tool these years, widely used by various communities including computer vision, text mining and bioinformatics. It is able to approximate each data sample in a data collection by a linear combination of a set of nonnegative basis vectors weighted by nonnegative weights. This often enables meaningful interpretation of the data, motivates useful insights and facilitates tasks such as data compression, clustering and classification. These subsequently lead to various active roles of NMF in data analysis, e.g., dimensionality reduction tool [11, 75], clustering tool[94, 82, 13, 39], feature engine [40], source separation tool [38], etc. Different methods based on NMF are proposed in this thesis: The modification of k- means clustering is chosen as one of the initialisation methods for NMF. Experimental results demonstrate the excellence of this method with improved compression performance. Independent principal component analysis (IPCA) which combines the advantage of both principal component analysis (PCA) and independent component analysis (ICA) has been chosen as the significant initialisation method for NMF with improved clustering accuracy. We have proposed the new evolutionary optimization strategy for NMF driven by three proposed update schemes in the solution space, saying NMF rule (or original movement), firefly rule (or beta movement) and survival of the fittest rule (or best movement). This proposed update strategy facilitates both the clustering and compression problems by using the different system objective functions that make use of the clustering and compression quality measurements. A hybrid initialisation approach is used by including the state-of-the-art NMF initialization methods as seed knowledge to increase the rate of convergence. There is no limitation for the number and the type of the initialization methods used for the proposed optimisation approach. Numerous computer experiments using the benchmark datasets verify the theoretical results, make comparisons among the techniques in measures of clustering/compression accuracy. Experimental results demonstrate the excellence of these methods with im- proved clustering/compression performance. In the application of EEG dataset, we employed several standard algorithms to provide clustering on preprocessed EEG data. We also explored ensemble clustering to obtain some tight clusters. We can make some statements based on the results we have got: firstly, normalization is necessary for this EEG brain dataset to obtain reasonable clustering; secondly, k-means, k-medoids and HC-Ward provide relatively better clustering results; thirdly, ensemble clustering enables us to tune the tightness of the clusters so that the research can be focused
    • …
    corecore