14 research outputs found

    A practical application of simulated annealing to clustering

    Get PDF
    Abstract--We formalize clustering as a partitioning problem with a user-defined internal clustering criterion and present SINICC, an unbiased, empirical method for comparing internal clustering criteria. An application to multi-sensor fusion is described, where the data set is composed of inexact sensor "reports" pertaining to "objects" in an environment. Given these reports, the objective is to produce a representation of the environment, where each entity in the representation is the result of "fusing" sensor reports. Before one can perform fusion, however, the reports must be "associated" into homogeneous clusters. Simulated annealing is used to find a near-optimal partitioning with respect to each of several clustering criteria for a variety of simulated data sets. This method can then be used to determine the "best" clustering criterion for the multi-sensor fusion problem with a given fusion operator

    Unsupervised Classification Using Immune Algorithm

    Full text link
    Unsupervised classification algorithm based on clonal selection principle named Unsupervised Clonal Selection Classification (UCSC) is proposed in this paper. The new proposed algorithm is data driven and self-adaptive, it adjusts its parameters to the data to make the classification operation as fast as possible. The performance of UCSC is evaluated by comparing it with the well known K-means algorithm using several artificial and real-life data sets. The experiments show that the proposed UCSC algorithm is more reliable and has high classification precision comparing to traditional classification methods such as K-means

    A Functional Workbench for Anopheles gambiae Micro Array Analysis

    Get PDF
    Insecticide resistance, a character inherited that encompasses alteration in one or more of insect’s genes is now a major public health challenge combating world efforts on malaria control strategies. Anopheles has developed heavy resistance to pyrethroids, the only World Health Organization (WHO) recommended class for Indoor Residual Spray (IRS) and Long-Lasting Insecticide Treated Nets (LLITNs) through P450 pathways. We used the biochemical network of Anopheles gambiae (henceforth Ag) to deduce its resistance mechanism(s) using two expression data (when Ag is treated with pyrethroid and when controlled). The employed computational techniques are accessible by a robust, multi-faceted and friendly automated graphic user interface (GUI) tagged ‘workbench’ with JavaFX Scenebuilder. In this work, we introduced a computational platform to determine and also elucidate for the first time resistance mechanism to a commonly used class of insecticide, Pyrethroid. Significantly, our work is the first computational work to identify genes associated or involved in the efflux system in Ag and as a resistance mechanism in the Anopheles

    A cellular coevolutionary algorithm for image segmentation

    Full text link

    Automatic Tuning of GRASP with Path-Relinking in data clustering with F-R ace and iterated F-Race

    Get PDF
    In studies that use metaheuristics although the input parameters directly influence the performance of the algorithm its definition is mostly done manually raising questions about the quality of the results. This paper aims to apply the F/I-Race in the self parameterization of GRASP with Path-Relinking in the data clustering in order to obtain better results than the manually tuned algorithms. Experiments performed with five data sets showed that the use of I/F-race contributed to achievement best results than manual tuning

    Reducing the number of membership functions in linguistic variables

    Get PDF
    Dissertation presented at Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia in fulfilment of the requirements for the Masters degree in Mathematics and Applications, specialization in Actuarial Sciences, Statistics and Operations ResearchThe purpose of this thesis was to develop algorithms to reduce the number of membership functions in a fuzzy linguistic variable. Groups of similar membership functions to be merged were found using clustering algorithms. By “summarizing” the information given by a similar group of membership functions into a new membership function we obtain a smaller set of membership functions representing the same concept as the initial linguistic variable. The complexity of clustering problems makes it difficult for exact methods to solve them in practical time. Heuristic methods were therefore used to find good quality solutions. A Scatter Search clustering algorithm was implemented in Matlab and compared to a variation of the K-Means algorithm. Computational results on two data sets are discussed. A case study with linguistic variables belonging to a fuzzy inference system automatically constructed from data collected by sensors while drilling in different scenarios is also studied. With these systems already constructed, the task was to reduce the number of membership functions in its linguistic variables without losing performance. A hierarchical clustering algorithm relying on performance measures for the inference system was implemented in Matlab. It was possible not only to simplify the inference system by reducing the number of membership functions in each linguistic variable but also to improve its performance

    Knowledge discovery techniques for transactional data model

    Get PDF
    In this work we give solutions to two key knowledge discovery problems for the Transactional Data model: Cluster analysis and Itemset mining. By knowledge discovery in context of these two problems, we specifically mean novel and useful ways of extracting clusters and itemsets from transactional data. Transactional Data model is widely used in a variety of applications. In cluster analysis the goal is to find clusters of similar transactions in the data with the collective properties of each cluster being unique. We propose the first clustering algorithm for transactional data which uses the latest model definition. All previously proposed algorithms did not use the important utility information in the data. Our novel technique effectively solves this problem. We also propose two new cluster validation metrics based on the criterion of high utility patterns. When comparing our technique with competing algorithms, we miss much fewer high utility patterns of importance than them. Itemset mining is the problem of searching for repeating patterns of high importance in the data. We show that the current model for itemset mining leads to information loss. It ignores the presence of clusters in the data. We propose a new itemset mining model which incorporates the cluster structure information. This allows the model to make predictions for future itemsets. We show that our model makes accurate predictions successfully, by discovering 30-40% future itemsets in most experiments on two benchmark datasets with negligible inaccuracies. There are no other present itemset prediction models, so accurate prediction is an accomplishment of ours. We provide further theoretical improvements in our model by making it capable of giving predictions for specific future windows by using time series forecasting. We also perform a detailed analysis of various clustering algorithms and study the effect of the Big Data phenomenon on them. This inspired us to further refine our model based on a classification problem design. This addition allows the mining of itemsets based on maximizing a customizable objective function made of different prediction metrics. The final framework design proposed by us is the first of its kind to make itemset predictions by using the cluster structure. It is capable of adapting the predictions to a specific future window and customizes the mining process to any specified prediction criterion. We create an implementation of the framework on a Web analytics data set, and notice that it successfully makes optimal prediction configuration choices with a high accuracy of 0.895

    A new approach of top-down induction of decision trees for knowledge discovery

    Get PDF
    Top-down induction of decision trees is the most popular technique for classification in the field of data mining and knowledge discovery. Quinlan developed the basic induction algorithm of decision trees, ID3 (1984), and extended to C4.5 (1993). There is a lot of research work for dealing with a single attribute decision-making node (so-called the first-order decision) of decision trees. Murphy and Pazzani (1991) addressed about multiple-attribute conditions at decision-making nodes. They show that higher order decision-making generates smaller decision trees and better accuracy. However, there always exist NP-complete combinations of multiple-attribute decision-makings.;We develop a new algorithm of second-order decision-tree inductions (SODI) for nominal attributes. The induction rules of first-order decision trees are combined by \u27AND\u27 logic only, but those of SODI consist of \u27AND\u27, \u27OR\u27, and \u27OTHERWISE\u27 logics. It generates more accurate results and smaller decision trees than any first-order decision tree inductions.;Quinlan used information gains via VC-dimension (Vapnik-Chevonenkis; Vapnik, 1995) for clustering the experimental values for each numerical attribute. However, many researchers have discovered the weakness of the use of VC-dim analysis. Bennett (1997) sophistically applies support vector machines (SVM) to decision tree induction. We suggest a heuristic algorithm (SVMM; SVM for Multi-category) that combines a TDIDT scheme with SVM. In this thesis it will be also addressed how to solve multiclass classification problems.;Our final goal for this thesis is IDSS (Induction of Decision Trees using SODI and SVMM). We will address how to combine SODI and SVMM for the construction of top-down induction of decision trees in order to minimize the generalized penalty cost

    K-means based clustering and context quantization

    Get PDF
    corecore