7,999 research outputs found

    Combinatorial persistency criteria for multicut and max-cut

    Full text link
    In combinatorial optimization, partial variable assignments are called persistent if they agree with some optimal solution. We propose persistency criteria for the multicut and max-cut problem as well as fast combinatorial routines to verify them. The criteria that we derive are based on mappings that improve feasible multicuts, respectively cuts. Our elementary criteria can be checked enumeratively. The more advanced ones rely on fast algorithms for upper and lower bounds for the respective cut problems and max-flow techniques for auxiliary min-cut problems. Our methods can be used as a preprocessing technique for reducing problem sizes or for computing partial optimality guarantees for solutions output by heuristic solvers. We show the efficacy of our methods on instances of both problems from computer vision, biomedical image analysis and statistical physics

    A Convex Formulation for Spectral Shrunk Clustering

    Full text link
    Spectral clustering is a fundamental technique in the field of data mining and information processing. Most existing spectral clustering algorithms integrate dimensionality reduction into the clustering process assisted by manifold learning in the original space. However, the manifold in reduced-dimensional subspace is likely to exhibit altered properties in contrast with the original space. Thus, applying manifold information obtained from the original space to the clustering process in a low-dimensional subspace is prone to inferior performance. Aiming to address this issue, we propose a novel convex algorithm that mines the manifold structure in the low-dimensional subspace. In addition, our unified learning process makes the manifold learning particularly tailored for the clustering. Compared with other related methods, the proposed algorithm results in more structured clustering result. To validate the efficacy of the proposed algorithm, we perform extensive experiments on several benchmark datasets in comparison with some state-of-the-art clustering approaches. The experimental results demonstrate that the proposed algorithm has quite promising clustering performance.Comment: AAAI201

    Low-Rank Matrices on Graphs: Generalized Recovery & Applications

    Get PDF
    Many real world datasets subsume a linear or non-linear low-rank structure in a very low-dimensional space. Unfortunately, one often has very little or no information about the geometry of the space, resulting in a highly under-determined recovery problem. Under certain circumstances, state-of-the-art algorithms provide an exact recovery for linear low-rank structures but at the expense of highly inscalable algorithms which use nuclear norm. However, the case of non-linear structures remains unresolved. We revisit the problem of low-rank recovery from a totally different perspective, involving graphs which encode pairwise similarity between the data samples and features. Surprisingly, our analysis confirms that it is possible to recover many approximate linear and non-linear low-rank structures with recovery guarantees with a set of highly scalable and efficient algorithms. We call such data matrices as \textit{Low-Rank matrices on graphs} and show that many real world datasets satisfy this assumption approximately due to underlying stationarity. Our detailed theoretical and experimental analysis unveils the power of the simple, yet very novel recovery framework \textit{Fast Robust PCA on Graphs

    A Short Survey on Data Clustering Algorithms

    Full text link
    With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial analysis. Formally speaking, given a set of data instances, a clustering algorithm is expected to divide the set of data instances into the subsets which maximize the intra-subset similarity and inter-subset dissimilarity, where a similarity measure is defined beforehand. In this work, the state-of-the-arts clustering algorithms are reviewed from design concept to methodology; Different clustering paradigms are discussed. Advanced clustering algorithms are also discussed. After that, the existing clustering evaluation metrics are reviewed. A summary with future insights is provided at the end
    • …
    corecore