1,035 research outputs found

    Finding groups in data: Cluster analysis with ants

    Get PDF
    Wepresent in this paper a modification of Lumer and Faieta’s algorithm for data clustering. This approach mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine, and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant clustering algorithms have received special attention, especially because they still require much investigation to improve performance, stability and other key features that would make such algorithms mature tools for data mining. As a case study, this paper focus on the behavior of clustering procedures in those new approaches. The proposed algorithm and its modifications are evaluated in a number of well-known benchmark datasets. Empirical results clearly show that ant-based clustering algorithms performs well when compared to another techniques

    Clustering stock exchange data by using evolutionary algorithms for portfolio management

    Get PDF
    In present paper, imperialist competitive algorithm and ant colony algorithm and particle swarm optimization algorithm have been used to cluster stocks of Tehran stock exchange. Also results of the three algorithms have been compared with three famous clustering models so called k-means, Fcm and Som. After clustering, a portfolio has been made by choosing some stocks from each cluster and using NSGA-II algorithm. Results show superiority of ant colony algorithms and particle swarm optimization algorithm and imperialist competitive to other three methods for clustering stocks. Due to diversification of the portfolio, portfolio risk will be reduced while using data chosen from the clusters. The more efficient the clustering, the lower the risk is. Also, using clustering for portfolio management reduces time of portfolio selection.peer-reviewe

    A Novel Ant based Clustering of Gene Expression Data using MapReduce Framework

    Get PDF
    Genes which exhibit similar patterns are often functionally related. Microarray technology provides a unique tool to examine how a cells gene expression pattern chang es under various conditions. Analyzing and interpreting these gene expression data is a challenging task. Clustering is one of the useful and popular methods to extract useful patterns from these gene expression data. In this paper multi colony ant based clustering approach is proposed. The whole processing procedure is divided into two parts: The first is the construction of Minimum spanning tree from the gene expression data using MapReduce version of ant colony optimization techniques. The second part is clustering, which is done by cutting the costlier edges from the minimum spanning tree, followed by one step k - means clustering procedure. Applied to different file sizes of gene expression data over different number of processors, the proposed approach exhibits good scalability and accuracy

    Development of computations in bioscience and bioinformatics and its application: review of the Symposium of Computations in Bioinformatics and Bioscience (SCBB06)

    Get PDF
    The first symposium of computations in bioinformatics and bioscience (SCBB06) was held in Hangzhou, China on June 21–22, 2006. Twenty-six peer-reviewed papers were selected for publication in this special issue of BMC Bioinformatics. These papers cover a broad range of topics including bioinformatics theories, algorithms, applications and tool development. The main technical topics contain gene expression analysis, sequence analysis, genome analysis, phylogenetic analysis, gene function prediction, molecular interaction and system biology, genetics and population study, immune strategy, protein structure prediction and proteomics

    Indonesian pharmacy retailer segmentation using recency frequency monetary-location model and ant K-means algorithm

    Get PDF
    We proposed an approach of retailer segmentation using a hybrid swarm intelligence algorithm and recency frequency monetary (RFM)-location model to develop a tailored marketing strategy for a pharmacy industry distribution company. We used sales data and plug it into MATLAB to implement ant clustering algorithm and K-means, then the results were analyzed using RFM-location model to calculate each clusters’ customer lifetime value (CLV). The algorithm generated 13 clusters of retailers based on provided data with a total of 1,138 retailers. Then, using RFM-location, some clusters were combined due to identical characteristics, the final clusters amounted to 8 clusters with unique characteristics. The findings can inform the decision-making process of the company, especially in prioritizing retailer segments and developing a tailored marketing strategy. We used a hybrid algorithm by leveraging the advantage of swarm intelligence and the power of K-means to cluster the retailers, then we further added value to the generated clusters by analyzing it using RFM-location model and CLV. However, location as a variable may not be relevant in smaller countries or developed countries, because the shipping cost may not be a problem. 

    AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments

    Get PDF
    This report considers the application of Articial Intelligence (AI) techniques to the problem of misuse detection and misuse localisation within telecommunications environments. A broad survey of techniques is provided, that covers inter alia rule based systems, model-based systems, case based reasoning, pattern matching, clustering and feature extraction, articial neural networks, genetic algorithms, arti cial immune systems, agent based systems, data mining and a variety of hybrid approaches. The report then considers the central issue of event correlation, that is at the heart of many misuse detection and localisation systems. The notion of being able to infer misuse by the correlation of individual temporally distributed events within a multiple data stream environment is explored, and a range of techniques, covering model based approaches, `programmed' AI and machine learning paradigms. It is found that, in general, correlation is best achieved via rule based approaches, but that these suffer from a number of drawbacks, such as the difculty of developing and maintaining an appropriate knowledge base, and the lack of ability to generalise from known misuses to new unseen misuses. Two distinct approaches are evident. One attempts to encode knowledge of known misuses, typically within rules, and use this to screen events. This approach cannot generally detect misuses for which it has not been programmed, i.e. it is prone to issuing false negatives. The other attempts to `learn' the features of event patterns that constitute normal behaviour, and, by observing patterns that do not match expected behaviour, detect when a misuse has occurred. This approach is prone to issuing false positives, i.e. inferring misuse from innocent patterns of behaviour that the system was not trained to recognise. Contemporary approaches are seen to favour hybridisation, often combining detection or localisation mechanisms for both abnormal and normal behaviour, the former to capture known cases of misuse, the latter to capture unknown cases. In some systems, these mechanisms even work together to update each other to increase detection rates and lower false positive rates. It is concluded that hybridisation offers the most promising future direction, but that a rule or state based component is likely to remain, being the most natural approach to the correlation of complex events. The challenge, then, is to mitigate the weaknesses of canonical programmed systems such that learning, generalisation and adaptation are more readily facilitated

    Stock market series analysis using self-organizing maps

    Get PDF
    In this work a new clustering technique is implemented and tested. The proposed approach is based on the application of a SOM (self-organizing map) neural network and provides means to cluster U-MAT aggregated data. It relies on a flooding algorithm operating on the U-MAT and resorts to the Calinski and Harabask index to assess the depth of flooding, providing an adequate number of clusters. The method is tuned for the analysis of stock market series. Results obtained are promising although limited in scope.Neste trabalho é implementada e testada uma nova técnica de agrupamento. A abordagem proposta baseia-se na aplicação de uma rede neuronal SOM (mapa auto-organizado) e permite agrupar dados sobre a matriz de distancias (U-MAT). É utilizado um algoritmo de alagamento ("flooding") sobre a U-MAT e o índice de Calinski e Harabasz avalia a profundidade do alagamento determinando-se, assim, o número de grupos mais adequado. O método é desenhado especificamente para a análise de séries temporais da bolsa de valores. Os resultados obtidos são promissores, embora se registem ainda limitações

    Stock market series analysis using self-organizing maps

    Get PDF
    In this work a new clustering technique is implemented and tested. The proposed approach is based on the application of a SOM (self-organizing map) neural network and provides means to cluster U-MAT aggregated data. It relies on a flooding algorithm operating on the U-MAT and resorts to the Calinski and Harabask index to assess the depth of flooding, providing an adequate number of clusters. The method is tuned for the analysis of stock market series. Results obtained are promising although limited in scope. Neste trabalho é implementada e testada uma nova técnica de agrupamento. A abordagem proposta baseia-se na aplicação de uma rede neuronal SOM (mapa autoorganizado) e permite agrupar dados sobre a matriz de distancias (U-MAT). É utilizado um algoritmo de alagamento ("flooding") sobre a U-MAT e o índice de Calinski e Harabasz avalia a profundidade do alagamento determinando-se, assim, o número de grupos mais adequado. O método é desenhado especificamente para a análise de séries temporais da bolsa de valores. Os resultados obtidos são promissores, embora se registem ainda limitações
    corecore