21 research outputs found

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    Big data clustering using grid computing and ant-based algorithm

    Get PDF
    Big data has the power to dramatically change the way institutes and organizations use their data. Transforming the massive amounts of data into knowledge will leverage the organizations performance to the maximum.Scientific and business organizations would benefit from utilizing big data. However, there are many challenges in dealing with big data such as storage, transfer, management and manipulation of big data.Many techniques are required to explore the hidden pattern inside the big data which have limitations in terms of hardware and software implementation. This paper presents a framework for big data clustering which utilizes grid technology and ant-based algorithm

    Identificación de documentos multilingües relacionados mediante algoritmos de clustering de hormigas

    Get PDF
    RESUMEN: Este artículo presenta una estrategia de representación documental y un algoritmo bioinspirado para realizar procesos de agrupamiento en colecciones multilingües de documentos en las áreas de la economía y la empresa. El enfoque propuesto permite al usuario identificar grupos de documentos económicos relacionados escritos en español o inglés usando técnicas inspiradas en comportamientos de organización y agrupamiento de objetos observados en algunos tipos de hormigas. Para conseguir una representación vectorial de cada documento independiente del idioma, se han utilizado dos recursos lingüísticos: un glosario económico y un tesauro. Cada documento es representado usando cuatro vectores de rasgos: palabras, nombres propios, términos económicos del glosario y descriptores del tesauro. La identificación de los nombres propios y la extracción y lematización de palabras se realizan usando herramientas específicas. El esquema tf-idf es utilizado para medir la importancia de cada rasgo en el documento, y se utiliza una combinación lineal convexa de separaciones angulares de los vectores de rasgos como medida de similitud de documentos. El trabajo muestra resultados experimentales de aplicación del algoritmo propuesto sobre un corpus español-inglés de documentos científicos de áreas económica y de gestión empresarial. Los resultados demuestran la utilidad y efectividad de las técnicas de ant clustering y del esquema de representación propuesto.ABSTRACT: This paper presents a document representation strategy and a bio-inspired algorithm to cluster multilingual collections of documents in the field of economics and business. The proposed approach allows the user to identify groups of related economics documents written in Spanish and English using techniques inspired on clustering and sorting behaviours observed in some types of ants. In order to obtain a language independent vector representation of each document two multilingual resources are used: an economic glossary and a thesaurus. Each document is represented using four feature vectors: words, proper names, economic terms in the glossary and thesaurus descriptors. The proper name identification, word extraction and lemmatization are performed using specific tools. The tf-idf scheme is used to measure the importance of each feature in the document, and a convex linear combination of angular separations between feature vectors is used as similarity measure of documents. The paper shows experimental results of the application of the proposed algorithm in a Spanish-English corpus of research papers in economics and management areas. The results demonstrate the usefulness and effectiveness of the ant clustering algorithm and the proposed representation scheme.This work has been partially supported by SistIngAlfa project, ref: ALFA II-0321-FA of the European Union and Project Ref. TIN2006-13615 of the Spanish Ministry of Education and Science

    Modelo de un sistema manejador de comunidades autorganizativo para un sistema operativo web multiagente

    Get PDF
    A model of the simulation and functional evaluation for a Community Manager System (CMS), on a Multiagent Web Operating System (MWOS) is presented. Following a reference design and concepts associated with swarm intelligence, the evaluation focused on auto-organization and emergent community management requirements using a gathering algorithm based on the behavioral emergent patterns of ant colonies. Both, Intra and Inter-wise CMS resources were searched using a dynamic auto-organizative management approach according to CMS requirements and the SWOM’s objectives.Este artículo describe la simulación y evaluación funcional de un Sistema Manejador de Comunidades (SMC) para un Sistema Operativo WEB Multiagente (SOWM), siguiendo un diseño de referencia y los conceptos asociados al área de Inteligencia de Enjambre (“Swarm Intelligence”) evaluando así, la autorganización y emergencia en la gestión de comunidades que requiere este tipo de sistemas. Para ello, se ha empleado un algoritmo de agrupamiento basado en el comportamiento emergente de las hormigas, para recoger y depositar cadáveres y formar pilas de ellos. Siguiendo un esquema de gestión de comunidades dinámica, autorganizativa y emergente, se realizó la búsqueda de servicios y recursos a nivel Intra e Inter en el SMC de acuerdo con los requerimientos y objetivos del SOWM y sus subsistemas

    DIDS Using Cooperative Agents Based on Ant Colony Clustering

    Get PDF
    Intrusion detection systems (IDS) play an important role in information security. Two major problems in the development of IDSs are the computational aspect and the architectural aspect. The computational or algorithmic problems include lacking ability of novel-attack detection and computation overload caused by large data traffic. The architectural problems are related to the communication between components of detection, including difficulties to overcome distributed and coordinated attacks because of the need of large amounts of distributed information and synchronization between detection components. This paper proposes a multi-agent architecture for a distributed intrusion detection system (DIDS) based on ant-colony clustering (ACC), for recognizing new and coordinated attacks, handling large data traffic, synchronization, co-operation between components without the presence of centralized computation, and good detection performance in real-time with immediate alarm notification. Feature selection based on principal component analysis (PCA) is used for dimensional reduction of NSL-KDD. Initial features are transformed to new features in smaller dimensions, where probing attacks (Ra-Probe) have a characteristic sign in their average value that is different from that of normal activity. Selection is based on the characteristics of these factors, resulting in a two-dimensional subset of the 75% data reduction

    Desarrollo de una aplicación para la gestión, clasificación y agrupamiento de documentos económicos con algoritmos bio-inspirados

    Get PDF
    Este trabajo describe el desarrollo de una aplicación Web que utiliza técnicas bioinspiradas para clasificar y agrupar colecciones multilingües de documentos en el campo de la economía y los negocios. La aplicación identifica grupos relacionados de documentos económicos, escritos en español e ingles, utilizando algoritmos de clustering inspirados en el comportamiento de las colonias de hormigas. Para la generación de una representación vectorial de los documentos que resulte independiente del idioma, se utilizan varios recursos lingüísticos y herramientas de procesamiento de documentos textuales. Cada documento es representado utilizando cuatro vectores de rasgos independientes del idioma, y la similitud entre ellos es calculada mediante combinaciones lineales convexas de las similitudes de esos vectores de rasgos. El trabajo presenta resultados experimentales obtenidos en la clasificación de un corpus de 250 documentos científicos en diversas áreas de la economía y administración de empresas

    Medoid-based clustering using ant colony optimization

    Get PDF
    The application of ACO-based algorithms in data mining has been growing over the last few years, and several supervised and unsupervised learning algorithms have been developed using this bio-inspired approach. Most recent works about unsupervised learning have focused on clustering, showing the potential of ACO-based techniques. However, there are still clustering areas that are almost unexplored using these techniques, such as medoid-based clustering. Medoid-based clustering methods are helpful—compared to classical centroid-based techniques—when centroids cannot be easily defined. This paper proposes two medoid-based ACO clustering algorithms, where the only information needed is the distance between data: one algorithm that uses an ACO procedure to determine an optimal medoid set (METACOC algorithm) and another algorithm that uses an automatic selection of the number of clusters (METACOC-K algorithm). The proposed algorithms are compared against classical clustering approaches using synthetic and real-world datasets

    Medoid-based clustering using ant colony optimization

    Get PDF
    The application of ACO-based algorithms in data mining has been growing over the last few years, and several supervised and unsupervised learning algorithms have been developed using this bio-inspired approach. Most recent works about unsupervised learning have focused on clustering, showing the potential of ACO-based techniques. However, there are still clustering areas that are almost unexplored using these techniques, such as medoid-based clustering. Medoid-based clustering methods are helpful—compared to classical centroid-based techniques—when centroids cannot be easily defined. This paper proposes two medoid-based ACO clustering algorithms, where the only information needed is the distance between data: one algorithm that uses an ACO procedure to determine an optimal medoid set (METACOC algorithm) and another algorithm that uses an automatic selection of the number of clusters (METACOC-K algorithm). The proposed algorithms are compared against classical clustering approaches using synthetic and real-world datasets

    Simplifying and improving ant-based clustering

    Get PDF
    Ant-based clustering (ABC) is a data clustering approach inspired from cemetery formation activities observed in real ant colonies. Building upon the premise of collective intelligence, such an approach uses multiple ant-like agents and a mixture of heuristics, in order to create systems that are capable of clustering real-world data. Many recently proposed ABC systems have shown competitive results, but these systems are geared towards adding new heuristics, resulting in increasingly complex systems that are harder to understand and improve. In contrast to this direction, we demonstrate that a state-of-the-art ABC system can be systematically evaluated and then simplified. The streamlined model, which we call SABC, differs fundamentally from traditional ABC systems as it does not use the ant-colony and several key components. Yet, our empirical study shows that SABC performs more effectively and effciently than the state-of-the-art ABC system

    A NOVEL HYBRID HARMONY SEARCH (HS) WITH WAR STRATEGY OPTIMIZATION (WSO) FOR SOLVING OPTIMIZATION PROBLEMS

    Get PDF
    The usage of nature-inspired meta-heuristic algorithms is increasing due to their simplicity and versatility. These algorithms are widely used in numerous domains, especially in scientific fields such as operations research, computer science, artificial intelligence, and mathematics. Based on the core principles of exploration and exploitation, they provide flexible problem-solving abilities. This study presents a novel method to improve the effectiveness of the War Strategy Optimization (WSO) algorithm for optimization issues. The suggested approach combines the WSO technique with the Harmony Search (HS) algorithm, resulting in a hybrid algorithm called H-WSO. The aim is to enhance the overall optimization performance by leveraging the capabilities of both algorithms through the integration of swarm intelligence approaches. In order to assess the effectiveness of the recently suggested H-WSO algorithm, a set of experiments was carried out on 50 benchmark test functions. These functions included both unimodal and multimodal functions and spanned across different dimensions. The findings from these studies clearly showed a notable enhancement in the efficiency of the H-WSO algorithm when compared to the original WSO algorithm. Various metrics were utilized to evaluate the effectiveness of the proposed algorithm, including the optimal fitness function value (Mean), Standard Deviation (St.d), and Median. The H-WSO algorithm regularly shows higher efficiency than the WSO algorithm, making it a promising and practical approach for addressing complicated optimization challenge
    corecore