35,230 research outputs found

    Directional clustering through matrix factorization

    No full text
    This paper deals with a clustering problem where feature vectors are clustered depending on the angle between feature vectors, that is, feature vectors are grouped together if they point roughly in the same direction. This directional distance measure arises in several applications, including document classification and human brain imaging. Using ideas from the field of constrained low-rank matrix factorization and sparse approximation, a novel approach is presented that differs from classical clustering methods, such as seminonnegative matrix factorization, K-EVD, or k-means clustering, yet combines some aspects of all these. As in nonnegative matrix factorization and K-EVD, the matrix decomposition is iteratively refined to optimize a data fidelity term; however, no positivity constraint is enforced directly nor do we need to explicitly compute eigenvectors. As in k-means and K-EVD, each optimization step is followed by a hard cluster assignment. This leads to an efficient algorithm that is shown here to outperform common competitors in terms of clustering performance and/or computation speed. In addition to a detailed theoretical analysis of some of the algorithm's main properties, the approach is empirically evaluated on a range of toy problems, several standard text clustering data sets, and a high-dimensional problem in brain imaging, where functional magnetic resonance imaging data are used to partition the human cerebral cortex into distinct functional regions

    A Proximity-Aware Hierarchical Clustering of Faces

    Full text link
    In this paper, we propose an unsupervised face clustering algorithm called "Proximity-Aware Hierarchical Clustering" (PAHC) that exploits the local structure of deep representations. In the proposed method, a similarity measure between deep features is computed by evaluating linear SVM margins. SVMs are trained using nearest neighbors of sample data, and thus do not require any external training data. Clusters are then formed by thresholding the similarity scores. We evaluate the clustering performance using three challenging unconstrained face datasets, including Celebrity in Frontal-Profile (CFP), IARPA JANUS Benchmark A (IJB-A), and JANUS Challenge Set 3 (JANUS CS3) datasets. Experimental results demonstrate that the proposed approach can achieve significant improvements over state-of-the-art methods. Moreover, we also show that the proposed clustering algorithm can be applied to curate a set of large-scale and noisy training dataset while maintaining sufficient amount of images and their variations due to nuisance factors. The face verification performance on JANUS CS3 improves significantly by finetuning a DCNN model with the curated MS-Celeb-1M dataset which contains over three million face images
    corecore