8 research outputs found
Clustering of Data with Missing Entries
The analysis of large datasets is often complicated by the presence of
missing entries, mainly because most of the current machine learning algorithms
are designed to work with full data. The main focus of this work is to
introduce a clustering algorithm, that will provide good clustering even in the
presence of missing data. The proposed technique solves an fusion
penalty based optimization problem to recover the clusters. We theoretically
analyze the conditions needed for the successful recovery of the clusters. We
also propose an algorithm to solve a relaxation of this problem using
saturating non-convex fusion penalties. The method is demonstrated on simulated
and real datasets, and is observed to perform well in the presence of large
fractions of missing entries.Comment: arXiv admin note: substantial text overlap with arXiv:1709.0187