Search CORE

1,748 research outputs found

An interior point algorithm for minimum sum-of-squares clustering

Author: Du Merle O
Hansen P
Jaumard B
Mladenović N
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/1999
Field of study

Copyright @ 2000 SIAM PublicationsAn exact algorithm is proposed for minimum sum-of-squares nonhierarchical clustering, i.e., for partitioning a given set of points from a Euclidean m-space into a given number of clusters in order to minimize the sum of squared distances from all points to the centroid of the cluster to which they belong. This problem is expressed as a constrained hyperbolic program in 0-1 variables. The resolution method combines an interior point algorithm, i.e., a weighted analytic center column generation method, with branch-and-bound. The auxiliary problem of determining the entering column (i.e., the oracle) is an unconstrained hyperbolic program in 0-1 variables with a quadratic numerator and linear denominator. It is solved through a sequence of unconstrained quadratic programs in 0-1 variables. To accelerate resolution, variable neighborhood search heuristics are used both to get a good initial solution and to solve quickly the auxiliary problem as long as global optimality is not reached. Estimated bounds for the dual variables are deduced from the heuristic solution and used in the resolution process as a trust region. Proved minimum sum-of-squares partitions are determined for the rst time for several fairly large data sets from the literature, including Fisher's 150 iris.This research was supported by the Fonds National de la Recherche Scientifique Suisse, NSERC-Canada, and FCAR-Quebec

CiteSeerX

PolyPublie

Brunel University Research Archive

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

arXiv.org e-Print Archive

Crossref

On the use of biased-randomized algorithms for solving non-smooth optimization problems

Author: Ferrer Biosca Albert
Gunes Corlu Canan
Juan Pérez Ángel Alejandro
Tordecilla Madera Rafael David
Torre Martínez Rocío de la
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Soft constraints are quite common in real-life applications. For example, in freight transportation, the fleet size can be enlarged by outsourcing part of the distribution service and some deliveries to customers can be postponed as well; in inventory management, it is possible to consider stock-outs generated by unexpected demands; and in manufacturing processes and project management, it is frequent that some deadlines cannot be met due to delays in critical steps of the supply chain. However, capacity-, size-, and time-related limitations are included in many optimization problems as hard constraints, while it would be usually more realistic to consider them as soft ones, i.e., they can be violated to some extent by incurring a penalty cost. Most of the times, this penalty cost will be nonlinear and even noncontinuous, which might transform the objective function into a non-smooth one. Despite its many practical applications, non-smooth optimization problems are quite challenging, especially when the underlying optimization problem is NP-hard in nature. In this paper, we propose the use of biased-randomized algorithms as an effective methodology to cope with NP-hard and non-smooth optimization problems in many practical applications. Biased-randomized algorithms extend constructive heuristics by introducing a nonuniform randomization pattern into them. Hence, they can be used to explore promising areas of the solution space without the limitations of gradient-based approaches, which assume the existence of smooth objective functions. Moreover, biased-randomized algorithms can be easily parallelized, thus employing short computing times while exploring a large number of promising regions. This paper discusses these concepts in detail, reviews existing work in different application areas, and highlights current trends and open research lines

The Oberta in open access

Solving Medium to Large Sized Euclidean Generalized Minimum Spanning Tree Problems

Author: Ghosh Diptesh
Publication venue
Publication date
Field of study

The generalized minimum spanning tree problem is a generalization of the minimum spanning tree problem. This network design problems ﬁnds several practical applications, especially when one considers the design of a large-capacity backbone network connecting several individual networks. In this paper we study the performance of six neighborhood search heuristics based on tabu search and variable neighborhood search on this problem domain. Our principal ﬁnding is that a tabu search heuristic almost always provides the best quality solution for small to medium sized instances within short execution times while variable neighborhood decomposition search provides the best quality solutions for most large instances.

Research Papers in Economics

Global Optimization strategies for two-mode clustering

Author: Castilli W.
Groenen P.J.F.
Rosmalen J.M. van
Trejos J.
Publication venue
Publication date
Field of study

Two-mode clustering is a relatively new form of clustering that clusters both rows and columns of a data matrix. To do so, a criterion similar to k-means is optimized. However, it is still unclear which optimization method should be used to perform two-mode clustering, as various methods may lead to non-global optima. This paper reviews and compares several optimization methods for two-mode clustering. Several known algorithms are discussed and a new, fuzzy algorithm is introduced. The meta-heuristics Multistart, Simulated Annealing, and Tabu Search are used in combination with these algorithms. The new, fuzzy algorithm is based on the fuzzy c-means algorithm of Bezdek (1981) and the Fuzzy Steps approach to avoid local minima of Heiser and Groenen (1997) and Groenen and Jajuga (2001). The performance of all methods is compared in a large simulation study. It is found that using a Multistart meta-heuristic in combination with a two-mode k-means algorithm or the fuzzy algorithm often gives the best results. Finally, an empirical data set is used to give a practical example of two-mode clustering.algorithms;fuzzy clustering;multistart;simulated annealing;simulation;tabu search;two-mode clustering

Research Papers in Economics

Recommended from our members

Variable neighbourhood search based heuristic for K-harmonic means clustering

Author: Alguwaizani Abdulrahman
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics
Publication date: 01/01/2011
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Although there has been a rapid development of technology and increase of computation speeds, most of the real-world optimization problems still cannot be solved in a reasonable time. Some times it is impossible for them to be optimally solved, as there are many instances of real problems which cannot be addressed by computers at their present speed. In such cases, the heuristic approach can be used. Heuristic research has been used by many researchers to supply this need. It gives a sufficient solution in reasonable time. The clustering problem is one example of this, formed in many applications. In this thesis, I suggest a Variable Neighbourhood Search (VNS) to improve a recent clustering local search called K-Harmonic Means (KHM).Many experiments are presented to show the strength of my code compared with some algorithms from the literature. Some counter-examples are introduced to show that KHM may degenerate entirely, in either one or more runs. Furthermore, it degenerates and then stops in some familiar datasets, which significantly affects the final solution. Hence, I present a removing degeneracy code for KHM. I also apply VNS to improve the code of KHM after removing the evidence of degeneracy

Brunel University Research Archive

A decision support methodology for process in the loop optimisation

Author: Chen Rui
Gladwin Dan
Stewart Jill
Stewart Paul
Winward Edward
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Experimental optimisation with hardware-in-the-loop is a common procedure in engineering, particularly in cases where accurate modelling is not possible. A common methodology to support experimental search is to use one of the many gradient descent methods. However, even sophisticated and proven methodologies such as Simulated Annealing (SA) can be significantly challenged in the presence of significant noise. This paper introduces a decision support methodology based upon Response Surfaces (RS), which supplements experimental management based on variable neighbourhood search, and is shown to be highly effective in directing experiments in the presence of significant signal to noise (S-N) ratio and complex combinatorial functions. The methodology is developed on a 3-dimensional surface with multiple local-minima and large basin of attraction, and high S-N ratio. Finally, the method is applied to a real-life automotive experimental application

University of Lincoln Institutional Repository

Genetic algorithm based two-mode clustering of metabolomics data

Author: Berg R.A., van den
Hageman J.A.
Smilde A.K.
Werf M.J., van der
Westerhuis J.A.
Publication venue
Publication date: 01/01/2008
Field of study

Metabolomics and other omics tools are generally characterized by large data sets with many variables obtained under different environmental conditions. Clustering methods and more specifically two-mode clustering methods are excellent tools for analyzing this type of data. Two-mode clustering methods allow for analysis of the behavior of subsets of metabolites under different experimental conditions. In addition, the results are easily visualized. In this paper we introduce a two-mode clustering method based on a genetic algorithm that uses a criterion that searches for homogeneous clusters. Furthermore we introduce a cluster stability criterion to validate the clusters and we provide an extended knee plot to select the optimal number of clusters in both experimental and metabolite modes. The genetic algorithm-based two-mode clustering gave biological relevant results when it was applied to two real life metabolomics data sets. It was, for instance, able to identify a catabolic pathway for growth on several of the carbon sources

Springer - Publisher Connector

Wageningen University & Research Publications

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Design of Homogeneous Territorial Units: A Methodological Proposal

Author: Jordi Surinach Caralt
Juan Carlos Duque
Raul Ramos Lobo
Publication venue
Publication date
Field of study

One of the main questions to solve when analysing geographically added information consists of the design of territorial units adjusted to the objectives of the study. This is related with the reduction of the effects of the Modifiable Areal Unit Problem (MAUP). In this paper an optimisation model to solve regionalisation problems is proposed. This model seeks to reduce some disadvantages found in previous works about automated regionalisation tools.contiguity constraint, zone design, optimisation, modifiable areal unit problem

Research Papers in Economics