1,988 research outputs found

    An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering

    Full text link
    The minimum sum-of-squares clustering (MSSC), or k-means type clustering, is traditionally considered an unsupervised learning task. In recent years, the use of background knowledge to improve the cluster quality and promote interpretability of the clustering process has become a hot research topic at the intersection of mathematical optimization and machine learning research. The problem of taking advantage of background information in data clustering is called semi-supervised or constrained clustering. In this paper, we present a branch-and-cut algorithm for semi-supervised MSSC, where background knowledge is incorporated as pairwise must-link and cannot-link constraints. For the lower bound procedure, we solve the semidefinite programming relaxation of the MSSC discrete optimization model, and we use a cutting-plane procedure for strengthening the bound. For the upper bound, instead, by using integer programming tools, we use an adaptation of the k-means algorithm to the constrained case. For the first time, the proposed global optimization algorithm efficiently manages to solve real-world instances up to 800 data points with different combinations of must-link and cannot-link constraints and with a generic number of features. This problem size is about four times larger than the one of the instances solved by state-of-the-art exact algorithms

    Global Optimization for Cardinality-constrained Minimum Sum-of-Squares Clustering via Semidefinite Programming

    Full text link
    The minimum sum-of-squares clustering (MSSC), or k-means type clustering, has been recently extended to exploit prior knowledge on the cardinality of each cluster. Such knowledge is used to increase performance as well as solution quality. In this paper, we propose a global optimization approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC. For the lower bound routine, we use the semidefinite programming (SDP) relaxation recently proposed by Rujeerapaiboon et al. [SIAM J. Optim. 29(2), 1211-1239, (2019)]. However, this relaxation can be used in a branch-and-cut method only for small-size instances. Therefore, we derive a new SDP relaxation that scales better with the instance size and the number of clusters. In both cases, we strengthen the bound by adding polyhedral cuts. Benefiting from a tailored branching strategy which enforces pairwise constraints, we reduce the complexity of the problems arising in the children nodes. For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node. Computational results show that the proposed algorithm globally solves, for the first time, real-world instances of size 10 times larger than those solved by state-of-the-art exact methods

    An exact CP approach for the cardinality-constrained euclidean minimum sum-of-squares clustering problem

    Get PDF
    Clustering consists in finding hidden groups from unlabeled data which are as homogeneous and well-separated as possible. Some contexts impose constraints on the clustering solutions such as restrictions on the size of each cluster, known as cardinality-constrained clustering. In this work we present an exact approach to solve the Cardinality-Constrained Euclidean Minimum Sum-of-Squares Clustering Problem. We take advantage of the structure of the problem to improve several aspects of previous constraint programming approaches: lower bounds, domain filtering, and branching. Computational experiments on benchmark instances taken from the literature confirm that our approach improves our solving capability over previously-proposed exact methods for this problem

    Mixed-integer programming techniques for the minimum sum-of-squares clustering problem

    Get PDF
    The minimum sum-of-squares clustering problem is a very important problem in data mining and machine learning with very many applications in, e.g., medicine or social sciences. However, it is known to be NP-hard in all relevant cases and to be notoriously hard to be solved to global optimality in practice. In this paper, we develop and test different tailored mixed-integer programming techniques to improve the performance of state-of-the-art MINLP solvers when applied to the problem—among them are cutting planes, propagation techniques, branching rules, or primal heuristics. Our extensive numerical study shows that our techniques significantly improve the performance of the open-source MINLP solver SCIP. Consequently, using our novel techniques, we can solve many instances that are not solvable with SCIP without our techniques and we obtain much smaller gaps for those instances that can still not be solved to global optimality

    Résolution exacte du problÚme de partitionnement de données avec minimisation de variance sous contraintes de cardinalité par programmation par contraintes

    Get PDF
    Le partitionnement de donnĂ©es reprĂ©sente une procĂ©dure destinĂ©e Ă  regrouper un ensemble d’observations dans plusieurs sous ensembles homogĂšnes et/ou bien sĂ©parĂ©s. L’idĂ©e derriĂšre une telle activitĂ© est de simplifier l’extraction d’information utile en Ă©tudiant les groupes rĂ©sultants plutĂŽt que les observations elles-mĂȘmes. Cela dit, plusieurs situations appellent Ă  ce que la solution gĂ©nĂ©rĂ©e respecte un ensemble de contraintes donnĂ©es. En particulier, on exige parfois que les groupes rĂ©sultants comportent un nombre prĂ©dĂ©fini d’élĂ©ments. On parle de partitionnement avec contraintes de cardinalitĂ©. On prĂ©sente alors, dans ce travail, une approche de rĂ©solution exacte pour le partitionnement de donnĂ©es avec minimisation de la variance sous contraintes de cardinalitĂ©. En utilisant le paradigme de la Programmation par Contraintes, on propose d’abord un modĂšle adĂ©quat du problĂšme selon celui-ci. Ensuite, on suggĂšre Ă  la fois une stratĂ©gie de recherche rehaussĂ©e ainsi que deux algorithmes de filtrage. Ces outils ainsi dĂ©veloppĂ©s tirent avantage de la structure particuliĂšre du problĂšme afin de naviguer l’espace de recherche de façon efficace, Ă  la recherche d’une solution globalement optimale. Des expĂ©rimentations pratiques montrent que notre approche procure un avantage important par rapport aux autres mĂ©thodes exactes existantes lors de la rĂ©solution de plusieurs exemplaires du problĂšme.----------ABSTRACT: Data clustering is a procedure designed to group a set of observations into subsets that are homogeneous and/or well separated. The idea behind such an endeavor is to simplify extraction of useful information by studying the resulting groups instead of directly dealing with the observations themselves. However, many situations mandate that the answer conform to a set of constraints. Particularly one that involves the target number of elements each group must possess. This is known as cardinality constrained clustering. In this work we present an exact approach to solve the cardinality constrained Euclidian minimum sum-of-squares clustering. Based on the Constraint Programming paradigm, we first present an adequate model for this problem in the aforementioned framework. We then suggest both an upgraded search heuristic as well as two filtering algorithms. We take advantage of the structure of the problem in designing these tools to efficiently navigate the search space, looking for a globally optimal solution. Computational experiments show that our approach provides a substantial boost to the resolution of several instances of the problem in comparison to existing exact methods

    Exact algorithms for minimum sum-of-squares clustering

    Get PDF
    NP-Hardness of Euclidean sum-of-squares clustering -- Computational complexity -- An incorrect reduction from the K-section problem -- A new proof by reduction from the densest cut problem -- Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering -- Reformulation-Linearization technique for the MSSC -- Branch-and-bound for the MSSC -- An attempt at reproducting computational results -- Breaking symmetry and convex hull inequalities -- A branch-and-cut SDP-based algorithm for minimum sum-of-squares clustering -- Equivalence of MSSC to 0-1 SDP -- A branch-and cut algorithm for the 0-1 SDP formulation -- Computational experiments -- An improved column generation algorithm for minimum sum-of-squares clustering -- Column generation algorithm revisited -- A geometric approach -- Generalization to the Euclidean space -- Computational results

    Proceedings of the XIII Global Optimization Workshop: GOW'16

    Get PDF
    [Excerpt] Preface: Past Global Optimization Workshop shave been held in Sopron (1985 and 1990), Szeged (WGO, 1995), Florence (GO’99, 1999), Hanmer Springs (Let’s GO, 2001), Santorini (Frontiers in GO, 2003), San JosĂ© (Go’05, 2005), Mykonos (AGO’07, 2007), Skukuza (SAGO’08, 2008), Toulouse (TOGO’10, 2010), Natal (NAGO’12, 2012) and MĂĄlaga (MAGO’14, 2014) with the aim of stimulating discussion between senior and junior researchers on the topic of Global Optimization. In 2016, the XIII Global Optimization Workshop (GOW’16) takes place in Braga and is organized by three researchers from the University of Minho. Two of them belong to the Systems Engineering and Operational Research Group from the Algoritmi Research Centre and the other to the Statistics, Applied Probability and Operational Research Group from the Centre of Mathematics. The event received more than 50 submissions from 15 countries from Europe, South America and North America. We want to express our gratitude to the invited speaker Panos Pardalos for accepting the invitation and sharing his expertise, helping us to meet the workshop objectives. GOW’16 would not have been possible without the valuable contribution from the authors and the International ScientiïŹc Committee members. We thank you all. This proceedings book intends to present an overview of the topics that will be addressed in the workshop with the goal of contributing to interesting and fruitful discussions between the authors and participants. After the event, high quality papers can be submitted to a special issue of the Journal of Global Optimization dedicated to the workshop. [...

    International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book

    Get PDF
    The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions. This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more
    • 

    corecore