567 research outputs found

    Discovering gene association networks by multi-objective evolutionary quantitative association rules

    Get PDF
    In the last decade, the interest in microarray technology has exponentially increased due to its ability to monitor the expression of thousands of genes simultaneously. The reconstruction of gene association networks from gene expression profiles is a relevant task and several statistical techniques have been proposed to build them. The problem lies in the process to discover which genes are more relevant and to identify the direct regulatory relationships among them. We developed a multi-objective evolutionary algorithm for mining quantitative association rules to deal with this problem. We applied our methodology named GarNet to a well-known microarray data of yeast cell cycle. The performance analysis of GarNet was organized in three steps similarly to the study performed by Gallo et al. GarNet outperformed the benchmark methods in most cases in terms of quality metrics of the networks, such as accuracy and precision, which were measured using YeastNet database as true network. Furthermore, the results were consistent with previous biological knowledge.Ministerio de Ciencia y Tecnología TIN2011-28956-C02-02Junta de Andalucía P11-TIC-752

    The Rule Extraction of Numerical Association Rule Mining Using Hybrid Evolutionary Algorithm

    Get PDF
    The topic of Particle Swarm Optimization (PSO) has recently gained popularity. Researchers has used it to solve difficulties related to job scheduling, evaluation of stock markets and association rule mining optimization. However, the PSO method often encounters the problem of getting trapped in the local optimum. Some researchers proposed a solution to over come that problem using combination of PSO and Cauchy distribution because this performance proved to reach the optimal rules. In this paper, we focus to adopt the combination for solving association rule mining (ARM) optimization problem in numerical dataset. Therefore, the aim of this research is to extract the rule of numerical ARM optimization problem for certain multi-objective functions such as support, confidence, and amplitude. This method is called PARCD. It means that PSO for numerical association rule mining problem with Cauchy Distribu- tion. PARCD performed better results than other methods such as MOPAR, MODENAR, GAR, MOGAR and RPSOA

    Improving a multi-objective evolutionary algorithm to discover quantitative association rules

    Get PDF
    This work aims at correcting flaws existing in multi-objective evolutionary schemes to discover quantitative association rules, specifically those based on the wellknown non-dominated sorting genetic algorithm-II (NSGA-II). In particular, a methodology is proposed to find the most suitable configurations based on the set of objectives to optimize and distance measures to rank the non-dominated solutions. First, several quality measures are analyzed to select the best set of them to be optimized. Furthermore, different strate-gies are applied to replace the crowding distance used by NSGA-II to sort the solutions for each Pareto-front since such distance is not suitable for handling many-objective problems. The proposed enhancements have been integrated into the multi-objective algorithm called MOQAR. Several experiments have been carried out to assess the algorithm’s performance by using different configuration settings, and the best ones have been compared to other existing algorithms. The results obtained show a remarkable performance of MOQAR in terms of quality measures.Ministerio de Ciencia y Tecnología TIN2011-28956-C02Ministerio de Ciencia y Tecnología TIN2014- 55894-C2-RJunta de Andalucia P12-TIC-1728Universidad Pablo de Olavide APPB81309

    Selecting the best measures to discover quantitative association rules

    Get PDF
    The majority of the existing techniques to mine association rules typically use the support and the confidence to evaluate the quality of the rules obtained. However, these two measures may not be sufficient to properly assess their quality due to some inherent drawbacks they present. A review of the literature reveals that there exist many measures to evaluate the quality of the rules, but that the simultaneous optimization of all measures is complex and might lead to poor results. In this work, a principal components analysis is applied to a set of measures that evaluate quantitative association rules' quality. From this analysis, a reduced subset of measures has been selected to be included in the fitness function in order to obtain better values for the whole set of quality measures, and not only for those included in the fitness function. This is a general-purpose methodology and can, therefore, be applied to the fitness function of any algorithm. To validate if better results are obtained when using the function fitness composed of the subset of measures proposed here, the existing QARGA algorithm has been applied to a wide variety of datasets. Finally, a comparative analysis of the results obtained by means of the application of QARGA with the original fitness function is provided, showing a remarkable improvement when the new one is used.Ministerio de Ciencia y Tecnología TIN2011-28956-C0

    Improved optimization of numerical association rule mining using hybrid particle swarm optimization and cauchy distribution

    Get PDF
    Particle Swarm Optimization (PSO) has been applied to solve optimization problems in various fields, such as Association Rule Mining (ARM) of numerical problems. However, PSO often becomes trapped in local optima. Consequently, the results do not represent the overall optimum solutions. To address this limitation, this study aims to combine PSO with the Cauchy distribution (PARCD), which is expected to increase the global optimal value of the expanded search space. Furthermore, this study uses multiple objective functions, i.e., support, confidence, comprehensibility, interestingness and amplitude. In addition, the proposed method was evaluated using benchmark datasets, such as the Quake, Basket ball, Body fat, Pollution, and Bolt datasets. Evaluation results were compared to the results obtained by previous studies. The results indicate that the overall values of the objective functions obtained using the proposed PARCD approach are satisfactory

    A data analytics-based energy information system (EIS) tool to perform meter-level anomaly detection and diagnosis in buildings

    Get PDF
    Recently, the spread of smart metering infrastructures has enabled the easier collection of building-related data. It has been proven that a proper analysis of such data can bring significant benefits for the characterization of building performance and spotting valuable saving opportunities. More and more researchers worldwide are focused on the development of more robust frameworks of analysis capable of extracting from meter-level data useful information to enhance the process of energy management in buildings, for instance, by detecting inefficiencies or anomalous energy behavior during operation. This paper proposes an innovative anomaly detection and diagnosis (ADD) methodology to automatically detect at whole-building meter level anomalous energy consumption and then perform a diagnosis on the sub-loads responsible for anomalous patterns. The process consists of multiple steps combining data analytics techniques. A set of evolutionary classification trees is developed to discover frequent and infrequent aggregated energy patterns, properly transformed through an adaptive symbolic aggregate approximation (aSAX) process. Then a post-mining analysis based on association rule mining (ARM) is performed to discover the main sub-loads which mostly affect the anomaly detected at the whole-building level. The methodology is developed and tested on monitored data of a medium voltage/low voltage (MV/LV) transformation cabin of a university campus

    Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications

    Get PDF
    Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies

    Discovering market basket patterns using hierarchical association rules

    Get PDF
    Association rules are a data mining method for discovering patterns of frequent item sets, such as products in a store that are frequently purchased at the same time by a customer (market basket analysis). A number of interestingness measures for association rules have been developed to date, but research has shown that there a dominant measure does not exist. Authors have mostly used objective measures, whereas subjective measures have rarely been investigated. This paper aims to combine objective measures such as support, confidence and lift with a subjective approach based on human expert selection in order to extract interesting rules from a real dataset collected from a large Croatian retail chain. Hierarchical association rules were used to enhance the efficiency of the extraction rule. The results show that rules that are more interesting were extracted using the hierarchical method, and that a hybrid approach of combining objective and subjective measures succeeds in extracting certain unexpected and actionable rules. The research can be useful for retail and marketing managers in planning marketing strategies, as well as for researchers investigating this field

    Obtaining optimal quality measures for quantitative association rules

    Get PDF
    There exist several works in the literature in which fitness functions based on a combination of weighted measures for the discovery of association rules have been proposed. Nevertheless, some differences in the measures used to assess the quality of association rules could be obtained according to the values of the weights of the measures included in the fitness function. Therefore, user's decision is very important in order to specify the weights of the measures involved in the optimization process. This paper presents a study of well-known quality measures with regard to the weights of the measures that appear in a fitness function. In particular, the fitness function of an existing evolutionary algorithm called QARGA has been considered with the purpose of suggesting the values that should be assigned to the weights, depending on the set of measures to be optimized. As initial step, several experiments have been carried out from 35 public datasets in order to show how the weights for confidence, support, amplitude and number of attributes measures included in the fitness function have an influence on different quality measures according to several minimum support thresholds. Second, statistical tests have been conducted for evaluating when the differences in measures of the rules obtained by QARGA are significative, and thus, to provide the best weights to be considered depending on the group of measures to be optimized. Finally, the results obtained when using the recommended weights for two real-world applications related to ozone and earthquakes are reported.Ministerio de Ciencia y Tecnología TIN2011-28956-C02Junta de Andalucía P12- TIC-1728Universidad Pablo de Olavide APPB81309