1,988 research outputs found
An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering
The minimum sum-of-squares clustering (MSSC), or k-means type clustering, is
traditionally considered an unsupervised learning task. In recent years, the
use of background knowledge to improve the cluster quality and promote
interpretability of the clustering process has become a hot research topic at
the intersection of mathematical optimization and machine learning research.
The problem of taking advantage of background information in data clustering is
called semi-supervised or constrained clustering. In this paper, we present a
branch-and-cut algorithm for semi-supervised MSSC, where background knowledge
is incorporated as pairwise must-link and cannot-link constraints. For the
lower bound procedure, we solve the semidefinite programming relaxation of the
MSSC discrete optimization model, and we use a cutting-plane procedure for
strengthening the bound. For the upper bound, instead, by using integer
programming tools, we use an adaptation of the k-means algorithm to the
constrained case. For the first time, the proposed global optimization
algorithm efficiently manages to solve real-world instances up to 800 data
points with different combinations of must-link and cannot-link constraints and
with a generic number of features. This problem size is about four times larger
than the one of the instances solved by state-of-the-art exact algorithms
Global Optimization for Cardinality-constrained Minimum Sum-of-Squares Clustering via Semidefinite Programming
The minimum sum-of-squares clustering (MSSC), or k-means type clustering, has
been recently extended to exploit prior knowledge on the cardinality of each
cluster. Such knowledge is used to increase performance as well as solution
quality. In this paper, we propose a global optimization approach based on the
branch-and-cut technique to solve the cardinality-constrained MSSC. For the
lower bound routine, we use the semidefinite programming (SDP) relaxation
recently proposed by Rujeerapaiboon et al. [SIAM J. Optim. 29(2), 1211-1239,
(2019)]. However, this relaxation can be used in a branch-and-cut method only
for small-size instances. Therefore, we derive a new SDP relaxation that scales
better with the instance size and the number of clusters. In both cases, we
strengthen the bound by adding polyhedral cuts. Benefiting from a tailored
branching strategy which enforces pairwise constraints, we reduce the
complexity of the problems arising in the children nodes. For the upper bound,
instead, we present a local search procedure that exploits the solution of the
SDP relaxation solved at each node. Computational results show that the
proposed algorithm globally solves, for the first time, real-world instances of
size 10 times larger than those solved by state-of-the-art exact methods
An exact CP approach for the cardinality-constrained euclidean minimum sum-of-squares clustering problem
Clustering consists in finding hidden groups from unlabeled data which are as homogeneous and well-separated as possible. Some contexts impose constraints on the clustering solutions such as restrictions on the size of each cluster, known as cardinality-constrained clustering. In this work we present an exact approach to solve the Cardinality-Constrained Euclidean Minimum Sum-of-Squares Clustering Problem. We take advantage of the structure of the problem to improve several aspects of previous constraint programming approaches: lower bounds, domain filtering, and branching. Computational experiments on benchmark instances taken from the literature confirm that our approach improves our solving capability over previously-proposed exact methods for this problem
Mixed-integer programming techniques for the minimum sum-of-squares clustering problem
The minimum sum-of-squares clustering problem is a very important problem in data mining and machine learning with very many applications in, e.g., medicine or social sciences. However, it is known to be NP-hard in all relevant cases and to be notoriously hard to be solved to global optimality in practice. In this paper, we develop and test different tailored mixed-integer programming techniques to improve the performance of state-of-the-art MINLP solvers when applied to the problemâamong them are cutting planes, propagation techniques, branching rules, or primal heuristics. Our extensive numerical study shows that our techniques significantly improve the performance of the open-source MINLP solver SCIP. Consequently, using our novel techniques, we can solve many instances that are not solvable with SCIP without our techniques and we obtain much smaller gaps for those instances that can still not be solved to global optimality
Résolution exacte du problÚme de partitionnement de données avec minimisation de variance sous contraintes de cardinalité par programmation par contraintes
Le partitionnement de donnĂ©es reprĂ©sente une procĂ©dure destinĂ©e Ă regrouper un ensemble dâobservations dans plusieurs sous ensembles homogĂšnes et/ou bien sĂ©parĂ©s. LâidĂ©e derriĂšre une telle activitĂ© est de simplifier lâextraction dâinformation utile en Ă©tudiant les groupes
rĂ©sultants plutĂŽt que les observations elles-mĂȘmes.
Cela dit, plusieurs situations appellent Ă ce que la solution gĂ©nĂ©rĂ©e respecte un ensemble de contraintes donnĂ©es. En particulier, on exige parfois que les groupes rĂ©sultants comportent un nombre prĂ©dĂ©fini dâĂ©lĂ©ments. On parle de partitionnement avec contraintes de cardinalitĂ©.
On prĂ©sente alors, dans ce travail, une approche de rĂ©solution exacte pour le partitionnement de donnĂ©es avec minimisation de la variance sous contraintes de cardinalitĂ©. En utilisant le paradigme de la Programmation par Contraintes, on propose dâabord un modĂšle adĂ©quat du
problĂšme selon celui-ci. Ensuite, on suggĂšre Ă la fois une stratĂ©gie de recherche rehaussĂ©e ainsi que deux algorithmes de filtrage. Ces outils ainsi dĂ©veloppĂ©s tirent avantage de la structure particuliĂšre du problĂšme afin de naviguer lâespace de recherche de façon efficace, Ă la recherche
dâune solution globalement optimale. Des expĂ©rimentations pratiques montrent que notre approche procure un avantage important par rapport aux autres mĂ©thodes exactes existantes lors de la rĂ©solution de plusieurs exemplaires du problĂšme.----------ABSTRACT: Data clustering is a procedure designed to group a set of observations into subsets that
are homogeneous and/or well separated. The idea behind such an endeavor is to simplify extraction of useful information by studying the resulting groups instead of directly dealing
with the observations themselves. However, many situations mandate that the answer conform to a set of constraints. Particularly one that involves the target number of elements each group must possess. This is known as cardinality constrained clustering. In this work we present an exact approach to solve the cardinality constrained Euclidian
minimum sum-of-squares clustering. Based on the Constraint Programming paradigm, we first present an adequate model for this problem in the aforementioned framework. We then suggest both an upgraded search heuristic as well as two filtering algorithms. We take advantage of the structure of the problem in designing these tools to efficiently navigate the search space, looking for a globally optimal solution.
Computational experiments show that our approach provides a substantial boost to the resolution of several instances of the problem in comparison to existing exact methods
Exact algorithms for minimum sum-of-squares clustering
NP-Hardness of Euclidean sum-of-squares clustering -- Computational complexity -- An incorrect reduction from the K-section problem -- A new proof by reduction from the densest cut problem -- Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering -- Reformulation-Linearization technique for the MSSC -- Branch-and-bound for the MSSC -- An attempt at reproducting computational results -- Breaking symmetry and convex hull inequalities -- A branch-and-cut SDP-based algorithm for minimum sum-of-squares clustering -- Equivalence of MSSC to 0-1 SDP -- A branch-and cut algorithm for the 0-1 SDP formulation -- Computational experiments -- An improved column generation algorithm for minimum sum-of-squares clustering -- Column generation algorithm revisited -- A geometric approach -- Generalization to the Euclidean space -- Computational results
Proceedings of the XIII Global Optimization Workshop: GOW'16
[Excerpt] Preface: Past Global Optimization Workshop shave been held in Sopron (1985 and 1990), Szeged (WGO, 1995), Florence (GOâ99, 1999), Hanmer Springs (Letâs GO, 2001), Santorini (Frontiers in GO, 2003), San JosĂ© (Goâ05, 2005), Mykonos (AGOâ07, 2007), Skukuza (SAGOâ08, 2008), Toulouse (TOGOâ10, 2010), Natal (NAGOâ12, 2012) and MĂĄlaga (MAGOâ14, 2014) with the aim of stimulating discussion between senior and junior researchers on the topic of Global Optimization. In 2016, the XIII Global Optimization Workshop (GOWâ16) takes place in Braga and is organized by three researchers from the University of Minho. Two of them belong to the Systems Engineering and Operational Research Group from the Algoritmi Research Centre and the other to the Statistics, Applied Probability and Operational Research Group from the Centre of Mathematics. The event received more than 50 submissions from 15 countries from Europe, South America and North America. We want to express our gratitude to the invited speaker Panos Pardalos for accepting the invitation and sharing his expertise, helping us to meet the workshop objectives. GOWâ16 would not have been possible without the valuable contribution from the authors and the International ScientiïŹc Committee members. We thank you all. This proceedings book intends to present an overview of the topics that will be addressed in the workshop with the goal of contributing to interesting and fruitful discussions between the authors and participants. After the event, high quality papers can be submitted to a special issue of the Journal of Global Optimization dedicated to the workshop. [...
International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book
The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions.
This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more
- âŠ