55 research outputs found

    Discriminant Analysis via Joint Euler Transform and ℓ2, 1-Norm

    Get PDF
    Linear discriminant analysis (LDA) has been widely used for face recognition. However, when identifying faces in the wild, the existence of outliers that deviate significantly from the rest of the data can arbitrarily skew the desired solution. This usually deteriorates LDA’s performance dramatically, thus preventing it from mass deployment in real-world applications. To handle this problem, we propose an effective distance metric learning method-based LDA, namely, Euler LDA-L21 (e-LDA-L21). e-LDA-L21 is carried out in two stages, in which each image is mapped into a complex space by Euler transform in the first stage and the ℓ2,1 -norm is adopted as the distance metric in the second stage. This not only reveals nonlinear features but also exploits the geometric structure of data. To solve e-LDA-L21 efficiently, we propose an iterative algorithm, which is a closed-form solution at each iteration with convergence guaranteed. Finally, we extend e-LDA-L21 to Euler 2DLDA-L21 (e-2DLDA-L21) which further exploits the spatial information embedded in image pixels. Experimental results on several face databases demonstrate its superiority over the state-of-the-art algorithms

    Low-Rank Clustering via LP1-PCA

    Get PDF
    In recent years, subspace clustering has found many practical use cases which include, for example, image segmentation, motion segmentation, and facial clustering. The image and video data that is common to these types of applications often has high dimensionality. Rather than viewing high dimensionality as a drawback, we propose a novel algorithm for subspace clustering that takes advantage of the high dimensional nature of such data. We call this algorithm LP1-PCA Spectral Clustering. Specifically, we introduce a concept that we call cluster-ID sparsity, and we propose an algorithm called LP1-PCA to attain this in low data dimensions. Our novel LP1-PCA algorithm is simple to implement and typically converges after only a few iterations. Conditions for which our algorithm performs well are discussed both theoretically and empirically, and we show that our method often attains superior clustering performance when compared to other common clustering algorithms on synthetic and real world datasets

    鲁棒自适应概率加权主成分分析

    Get PDF
    主成分分析(Principle component analysis, PCA)是处理高维数据的重要方法.近年来,基于各种范数的PCA模型得到广泛研究,用以提高PCA对噪声的鲁棒性.但是这些算法一方面没有考虑重建误差和投影数据描..

    Flexible unsupervised feature extraction for image classification

    Get PDF
    Dimensionality reduction is one of the fundamental and important topics in the fields of pattern recognition and machine learning. However, most existing dimensionality reduction methods aim to seek a projection matrix W such that the projection W T x is exactly equal to the true low-dimensional representation. In practice, this constraint is too rigid to well capture the geometric structure of data. To tackle this problem, we relax this constraint but use an elastic one on the projection with the aim to reveal the geometric structure of data. Based on this context, we propose an unsupervised dimensionality reduction model named flexible unsupervised feature extraction (FUFE) for image classification. Moreover, we theoretically prove that PCA and LPP, which are two of the most representative unsupervised dimensionality reduction models, are special cases of FUFE, and propose a non-iterative algorithm to solve it. Experiments on five real-world image databases show the effectiveness of the proposed model

    A-SFS: Semi-supervised Feature Selection based on Multi-task Self-supervision

    Full text link
    Feature selection is an important process in machine learning. It builds an interpretable and robust model by selecting the features that contribute the most to the prediction target. However, most mature feature selection algorithms, including supervised and semi-supervised, fail to fully exploit the complex potential structure between features. We believe that these structures are very important for the feature selection process, especially when labels are lacking and data is noisy. To this end, we innovatively introduce a deep learning-based self-supervised mechanism into feature selection problems, namely batch-Attention-based Self-supervision Feature Selection(A-SFS). Firstly, a multi-task self-supervised autoencoder is designed to uncover the hidden structure among features with the support of two pretext tasks. Guided by the integrated information from the multi-self-supervised learning model, a batch-attention mechanism is designed to generate feature weights according to batch-based feature selection patterns to alleviate the impacts introduced by a handful of noisy data. This method is compared to 14 major strong benchmarks, including LightGBM and XGBoost. Experimental results show that A-SFS achieves the highest accuracy in most datasets. Furthermore, this design significantly reduces the reliance on labels, with only 1/10 labeled data needed to achieve the same performance as those state of art baselines. Results show that A-SFS is also most robust to the noisy and missing data.Comment: 18 pages, 7 figures, accepted by knowledge-based system

    K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data Analysis

    Full text link
    Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L2,1_{2,1} norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins.Comment: 28 pages, 11 figure
    corecore