37 research outputs found
Document Clustering Based On Max-Correntropy Non-Negative Matrix Factorization
Nonnegative matrix factorization (NMF) has been successfully applied to many
areas for classification and clustering. Commonly-used NMF algorithms mainly
target on minimizing the distance or Kullback-Leibler (KL) divergence,
which may not be suitable for nonlinear case. In this paper, we propose a new
decomposition method by maximizing the correntropy between the original and the
product of two low-rank matrices for document clustering. This method also
allows us to learn the new basis vectors of the semantic feature space from the
data. To our knowledge, we haven't seen any work has been done by maximizing
correntropy in NMF to cluster high dimensional document data. Our experiment
results show the supremacy of our proposed method over other variants of NMF
algorithm on Reuters21578 and TDT2 databasets.Comment: International Conference of Machine Learning and Cybernetics (ICMLC)
201
Graph Regularized Non-negative Matrix Factorization By Maximizing Correntropy
Non-negative matrix factorization (NMF) has proved effective in many
clustering and classification tasks. The classic ways to measure the errors
between the original and the reconstructed matrix are distance or
Kullback-Leibler (KL) divergence. However, nonlinear cases are not properly
handled when we use these error measures. As a consequence, alternative
measures based on nonlinear kernels, such as correntropy, are proposed.
However, the current correntropy-based NMF only targets on the low-level
features without considering the intrinsic geometrical distribution of data. In
this paper, we propose a new NMF algorithm that preserves local invariance by
adding graph regularization into the process of max-correntropy-based matrix
factorization. Meanwhile, each feature can learn corresponding kernel from the
data. The experiment results of Caltech101 and Caltech256 show the benefits of
such combination against other NMF algorithms for the unsupervised image
clustering
Sparse feature learning for image analysis in segmentation, classification, and disease diagnosis.
The success of machine learning algorithms generally depends on intermediate data representation, called features that disentangle the hidden factors of variation in data. Moreover, machine learning models are required to be generalized, in order to reduce the specificity or bias toward the training dataset. Unsupervised feature learning is useful in taking advantage of large amount of unlabeled data, which is available to capture these variations. However, learned features are required to capture variational patterns in data space. In this dissertation, unsupervised feature learning with sparsity is investigated for sparse and local feature extraction with application to lung segmentation, interpretable deep models, and Alzheimer\u27s disease classification. Nonnegative Matrix Factorization, Autoencoder and 3D Convolutional Autoencoder are used as architectures or models for unsupervised feature learning. They are investigated along with nonnegativity, sparsity and part-based representation constraints for generalized and transferable feature extraction
Structure Preserving Large Imagery Reconstruction
With the explosive growth of web-based cameras and mobile devices, billions
of photographs are uploaded to the internet. We can trivially collect a huge
number of photo streams for various goals, such as image clustering, 3D scene
reconstruction, and other big data applications. However, such tasks are not
easy due to the fact the retrieved photos can have large variations in their
view perspectives, resolutions, lighting, noises, and distortions.
Fur-thermore, with the occlusion of unexpected objects like people, vehicles,
it is even more challenging to find feature correspondences and reconstruct
re-alistic scenes. In this paper, we propose a structure-based image completion
algorithm for object removal that produces visually plausible content with
consistent structure and scene texture. We use an edge matching technique to
infer the potential structure of the unknown region. Driven by the estimated
structure, texture synthesis is performed automatically along the estimated
curves. We evaluate the proposed method on different types of images: from
highly structured indoor environment to natural scenes. Our experimental
results demonstrate satisfactory performance that can be potentially used for
subsequent big data processing, such as image localization, object retrieval,
and scene reconstruction. Our experiments show that this approach achieves
favorable results that outperform existing state-of-the-art techniques
Robust Manifold Nonnegative Tucker Factorization for Tensor Data Representation
Nonnegative Tucker Factorization (NTF) minimizes the euclidean distance or
Kullback-Leibler divergence between the original data and its low-rank
approximation which often suffers from grossly corruptions or outliers and the
neglect of manifold structures of data. In particular, NTF suffers from
rotational ambiguity, whose solutions with and without rotation transformations
are equally in the sense of yielding the maximum likelihood. In this paper, we
propose three Robust Manifold NTF algorithms to handle outliers by
incorporating structural knowledge about the outliers. They first applies a
half-quadratic optimization algorithm to transform the problem into a general
weighted NTF where the weights are influenced by the outliers. Then, we
introduce the correntropy induced metric, Huber function and Cauchy function
for weights respectively, to handle the outliers. Finally, we introduce a
manifold regularization to overcome the rotational ambiguity of NTF. We have
compared the proposed method with a number of representative references
covering major branches of NTF on a variety of real-world image databases.
Experimental results illustrate the effectiveness of the proposed method under
two evaluation metrics (accuracy and nmi)
Supervised cross-modal factor analysis for multiple modal data classification
In this paper we study the problem of learning from multiple modal data for
purpose of document classification. In this problem, each document is composed
two different modals of data, i.e., an image and a text. Cross-modal factor
analysis (CFA) has been proposed to project the two different modals of data to
a shared data space, so that the classification of a image or a text can be
performed directly in this space. A disadvantage of CFA is that it has ignored
the supervision information. In this paper, we improve CFA by incorporating the
supervision information to represent and classify both image and text modals of
documents. We project both image and text data to a shared data space by factor
analysis, and then train a class label predictor in the shared space to use the
class label information. The factor analysis parameter and the predictor
parameter are learned jointly by solving one single objective function. With
this objective function, we minimize the distance between the projections of
image and text of the same document, and the classification error of the
projection measured by hinge loss function. The objective function is optimized
by an alternate optimization strategy in an iterative algorithm. Experiments in
two different multiple modal document data sets show the advantage of the
proposed algorithm over other CFA methods
Optimization algorithms for inference and classification of genetic profiles from undersampled measurements
In this thesis, we tackle three different problems, all related to optimization techniques for inference and classification of genetic profiles. First, we extend the deterministic Non-negative Matrix Factorization (NMF) framework to the probabilistic case (PNMF). We apply the PNMF algorithm to cluster and classify DNA microarrays data. The proposed PNMF is shown to outperform the deterministic NMF and the sparse NMF algorithms in clustering stability and classification accuracy. Second, we propose SMURC: Small-sample MUltivariate Regression with Covariance estimation. Specifically, we consider a high dimension low sample-size multivariate regression problem that accounts for correlation of the response variables. We show that, in this case, the maximum likelihood approach is senseless because the likelihood diverges. We propose a normalization of the likelihood function that guarantees convergence. Simulation results show that SMURC outperforms the regularized likelihood estimator with known covariance matrix and the state-of-the-art sparse Conditional Graphical Gaussian Model (sCGGM). In the third Chapter, we derive a new greedy algorithm that provides an exact sparse solution of the combinatorial l sub zero-optimization problem in an exponentially less computation time. Unlike other greedy approaches, which are only approximations of the exact sparse solution, the proposed greedy approach, called Kernel reconstruction, leads to the exact optimal solution
Non-negative Matrix Factorization: A Survey
CAUL read and publish agreement 2022Publishe