State-of-the-art subspace clustering methods are based on expressing each
data point as a linear combination of other data points while regularizing the
matrix of coefficients with ℓ1, ℓ2 or nuclear norms. ℓ1
regularization is guaranteed to give a subspace-preserving affinity (i.e.,
there are no connections between points from different subspaces) under broad
theoretical conditions, but the clusters may not be connected. ℓ2 and
nuclear norm regularization often improve connectivity, but give a
subspace-preserving affinity only for independent subspaces. Mixed ℓ1,
ℓ2 and nuclear norm regularizations offer a balance between the
subspace-preserving and connectedness properties, but this comes at the cost of
increased computational complexity. This paper studies the geometry of the
elastic net regularizer (a mixture of the ℓ1 and ℓ2 norms) and uses
it to derive a provably correct and scalable active set method for finding the
optimal coefficients. Our geometric analysis also provides a theoretical
justification and a geometric interpretation for the balance between the
connectedness (due to ℓ2 regularization) and subspace-preserving (due to
ℓ1 regularization) properties for elastic net subspace clustering. Our
experiments show that the proposed active set method not only achieves
state-of-the-art clustering performance, but also efficiently handles
large-scale datasets.Comment: 15 pages, 6 figures, accepted to CVPR 2016 for oral presentatio