29 research outputs found

    Improving the Performance of K-Means for Color Quantization

    Full text link
    Color quantization is an important operation with many applications in graphics and image processing. Most quantization methods are essentially based on data clustering algorithms. However, despite its popularity as a general purpose clustering algorithm, k-means has not received much respect in the color quantization literature because of its high computational requirements and sensitivity to initialization. In this paper, we investigate the performance of k-means as a color quantizer. We implement fast and exact variants of k-means with several initialization schemes and then compare the resulting quantizers to some of the most popular quantizers in the literature. Experiments on a diverse set of images demonstrate that an efficient implementation of k-means with an appropriate initialization strategy can in fact serve as a very effective color quantizer.Comment: 26 pages, 4 figures, 13 table

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.Comment: 17 pages, 1 figure, 7 table

    Influence of Cold Plasma on Sesame Paste and the Nano Sesame Paste Based on Co-occurrence Matrix

    Get PDF
    يهدف البحث الى معرفة تأثير البلازما الباردة على البكتيريا المزروعة على راشي السمسم في جسيماته الطبيعية وحجم جسيمات النانو. بدءًا من استخدام عملية تجزئة الصور اعتمادًا على طريقة العتبة، يتم استخدامها للتخلص من انعكاس الشرائح الزجاجية التي توضع عليها عينات السمسم. تم تنفيذ عملية التصنيف لفصل راشي السمسم الطبيعي عن غير الطبيعي. الراشي غير الطبيعي يظهرعندما نمت البكتيريا على راشي السمسم بعد تركه  لمدة يومين في الهواء، ان عملية التصنيف الموجه  معدل k-  استخدمت  لتصنيف المنطقة المصابة والمنطقة الطبيعية والمنطقة المعالجة. والبكتريا المعالجة بالبلازما الباردة، مدة التعرض دقيقتان. تم حساب الميزات التركيبية المتعلقة بمصفوفة التدرج اللوني ذات المستوى الرمادي للنسيج الطبيعي وغير الطبيعي والمعالج، ومن الواضح أن مجموعة  النسيج المعالج لها أفضل الميزات مقارنة باالمجاميع  الأخرى. أظهرت النتيجة أن راشي السمسم المعالج بالبلازما له نتائج جيدة مقارنة مع راشي السمسم النانوي  المعالج بالبلازما. وذلك لأن البلازما تزود راشي السمسم بالحرارة وتجعل جزيئات السمسم النانوية تتجمع معًا.The aim of the research is to investigate the effect of cold plasma on the bacteria grown on texture of sesame paste in its normal particle and nano particle size. Starting by using the image segmentation process depending on the threshold method, it is used to get rid of the reflection of the glass slides on which the sesame samples are placed.  The classification process implemented to separate the sesame paste texture from normal and abnormal texture. The abnormal texture appears when the bacteria has been grown on the sesame paste after being left for two days in the air, unsupervised k-mean classification process used to classify the infected region, the normal region and the treated region. The bacteria treated with cold plasma, the time exposure is two minutes. The textural features related to gray level co-occurrence matrix are calculated for the normal, abnormal and the treated texture, it is obvious that the treated texture class has the best features compared with the other classes. The result shows the sesame paste treated with plasma has good result compared with nano sesame paste treated with plasma.  This is because the plasma provides the sesame paste with heat and makes the sesame nano particle congregate together

    An initialization scheme for supervized K-means

    Get PDF
    Over the last years, researchers have focused their attention on a new approach, supervised clustering, that combines the main characteristics of both traditional clustering and supervised classification tasks. Motivated by the importance of the initialization in the traditional clustering context, this paper explores to what extent supervised initialization step could help traditional clustering to obtain better performances on supervised clustering tasks. This paper reports experiments which show that the simple proposed approach yields a good solution together with significant reduction of the computational cost

    Дуальная оптимизация тоновой аппроксимации монохромных изображений параллельным эволюционно-генетическим поиском

    Get PDF
    The paper considers the optimization problem of tone approximation for monochrome (for example: in grayscale palette) images. The procedure of tone approximation implies the reduction of approximated image’s number of tones, which are used in image displaying, compared to number of tones in the original image. The point of the procedure optimization consists of minimization of visual quality loses that estimated according to total or mean deviation between the same pixels of original image and approximated one. As a tool of the optimization the hybrid algorithm is used. It was developed and investigated by authors. The hybrid algorithm combines heuristic and deterministic algorithms of searching the best structure of approximating palette according to criterion of deviations minimization. The heuristic algorithm is based on evolutionarily-genetic paradigm. The main goal of heuristic stage is the reduction of search area of approximating palette’s structures that are the closest to optimum. Such role for heuristic stage was defined according to its fast computational time. The goal of deterministic algorithm of directed exhaustive search is to find the nearest extreme for the result that was obtained by previous algorithm. The developed hybrid algorithm allows to provide dual optimization of tone approximation. It means that the algorithm provides a result, in which two different criteria become optimal relative to each other. The current investigation is devoted to consideration of possibility to increase the effectiveness of hybrid algorithm on the level of heuristic stage. The possibility of implementation the parallel model of evolutionarily-genetic algorithm with different settings is considered. The results of initial experiments are discussed and compared with known algorithm of tone approximation.В статье рассматривается оптимизация процедуры тоновой аппроксимации полутоновых (например, в палитре серого цвета) изображений. Процедура тоновой аппроксимации подразумевает сокращение в палитре аппроксимированного изображения количества используемых тонов по сравнению с количеством тонов в палитре исходного изображения. Оптимизация этой процедуры заключается в минимизации потери качества передачи графической информации, которая оценивается суммарным или усредненным по изображению отклонением тонов координатно-идентичных пикселей аппроксимированного изображения от тонов исходного. В качестве инструмента оптимизации предлагается гибридный алгоритм, который совмещает эвристический и детерминированный алгоритмы поиска наилучшей по критерию минимизации ошибки аппроксимации структуры аппроксимирующей палитры. Эвристический алгоритм реализован на основе эволюционно-генетической парадигмы. Его задачей является поиск области тоновых структур аппроксимирующей палитры, максимально близких к оптимальной. Цель детерминированного алгоритма направленного перебора — найти ближайший к полученному предыдущим поиском результату экстремум критерия качества аппроксимации. Эвристический алгоритм, как более быстродействующий, нацелен на оперативное сокращение области поиска, а детерминированный, как более затратный, — на нахождение хотя бы локального экстремума (а, возможно, и глобального) по максимально сокращенному предыдущим алгоритмом пути. Совместная работа этих алгоритмов позволяет обеспечить процессу тоновой аппроксимации эффект оптимизации, названный в статье дуальной. Под этим термином подразумевается получение результата, при котором достигается экстремум критерия качества аппроксимации при минимизации времени его достижения. Описываемое в статье исследование посвящено повышению результативности гибридного алгоритма на эвристическом этапе, в качестве которого используется модифицированный эволюционно-генетический алгоритм. Рассматриваются перспективы разработки и оценки эффективности внедрения модели параллельного использования алгоритмов с различными параметрами настройки. Обсуждаются первичные эксперименты, а их результаты сравниваются с известным алгоритмом решения поставленной задачи

    Band depth based initialization of K-means for functional data clustering

    Get PDF
    The k-Means algorithm is one of the most popular choices for clustering data but is well-known to be sensitive to the initialization process. There is a substantial number of methods that aim at finding optimal initial seeds for k-Means, though none of them is universally valid. This paper presents an extension to longitudinal data of one of such methods, the BRIk algorithm, that relies on clustering a set of centroids derived from bootstrap replicates of the data and on the use of the versatile Modified Band Depth. In our approach we improve the BRIk method by adding a step where we fit appropriate B-splines to our observations and a resampling process that allows computational feasibility and handling issues such as noise or missing data. We have derived two techniques for providing suitable initial seeds, each of them stressing respectively the multivariate or the functional nature of the data. Our results with simulated and real data sets indicate that our Functional Data Approach to the BRIK method (FABRIk) and our Functional Data Extension of the BRIK method (FDEBRIk) are more effective than previous proposals at providing seeds to initialize k-Means in terms of clustering recovery.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was partially supported by the Spanish Ministry of Education [collaboration grant in university departments, Archive ID 18C01/003730] and the Spanish Ministry of Science, Innovation and Universities [grants numbers PID2020-116567GB-C22 and PID2020-112796RB-C22]
    corecore