5,977 research outputs found

    Mining Optimized Association Rules for Numeric Attributes

    Get PDF
    AbstractGiven a huge database, we address the problem of finding association rules for numeric attributes, such as(Balance∈I)⇒(CardLoan=yes),which implies that bank customers whose balances fall in a rangeIare likely to use card loan with a probability greater thanp. The above rule is interesting only if the rangeIhas some special feature with respect to the interrelation betweenBalanceandCardLoan. It is required that the number of customers whose balances are contained inI(called thesupportofI) is sufficient and also that the probabilitypof the conditionCardLoan=yesbeing met (called theconfidence ratio) be much higher than the average probability of the condition over all the data. Our goal is to realize a system that finds such appropriate ranges automatically. We mainly focus on computing twooptimized ranges: one that maximizes the support on the condition that the confidence ratio is at least a given threshold value, and another that maximizes the confidence ratio on the condition that the support is at least a given threshold number. Using techniques from computational geometry, we present novel algorithms that compute the optimized ranges in linear time if the data are sorted. Since sorting data with respect to each numeric attribute is expensive in the case of huge databases that occupy much more space than the main memory, we instead apply randomized bucketing as the preprocessing method and thus obtain an efficient rule-finding system. Tests show that our implementation is fast not only in theory but also in practice. The efficiency of our algorithm enables us to compute optimized rules for all combinations of hundreds of numeric and Boolean attributes in a reasonable time

    RESEARCH ISSUES CONCERNING ALGORITHMS USED FOR OPTIMIZING THE DATA MINING PROCESS

    Get PDF
    In this paper, we depict some of the most widely used data mining algorithms that have an overwhelming utility and influence in the research community. A data mining algorithm can be regarded as a tool that creates a data mining model. After analyzing a set of data, an algorithm searches for specific trends and patterns, then defines the parameters of the mining model based on the results of this analysis. The above defined parameters play a significant role in identifying and extracting actionable patterns and detailed statistics. The most important algorithms within this research refer to topics like clustering, classification, association analysis, statistical learning, link mining. In the following, after a brief description of each algorithm, we analyze its application potential and research issues concerning the optimization of the data mining process. After the presentation of the data mining algorithms, we will depict the most important data mining algorithms included in Microsoft and Oracle software products, useful suggestions and criteria in choosing the most recommended algorithm for solving a mentioned task, advantages offered by these software products.data mining optimization, data mining algorithms, software solutions

    Selecting the best measures to discover quantitative association rules

    Get PDF
    The majority of the existing techniques to mine association rules typically use the support and the confidence to evaluate the quality of the rules obtained. However, these two measures may not be sufficient to properly assess their quality due to some inherent drawbacks they present. A review of the literature reveals that there exist many measures to evaluate the quality of the rules, but that the simultaneous optimization of all measures is complex and might lead to poor results. In this work, a principal components analysis is applied to a set of measures that evaluate quantitative association rules' quality. From this analysis, a reduced subset of measures has been selected to be included in the fitness function in order to obtain better values for the whole set of quality measures, and not only for those included in the fitness function. This is a general-purpose methodology and can, therefore, be applied to the fitness function of any algorithm. To validate if better results are obtained when using the function fitness composed of the subset of measures proposed here, the existing QARGA algorithm has been applied to a wide variety of datasets. Finally, a comparative analysis of the results obtained by means of the application of QARGA with the original fitness function is provided, showing a remarkable improvement when the new one is used.Ministerio de Ciencia y Tecnología TIN2011-28956-C0

    A Survey on Data Mining Algorithm for Market Basket Analysis

    Get PDF
    Association rule mining identifies the remarkable association or relationship between a large set of data items. With huge quantity of data constantly being obtained and stored in databases, several industries are becoming concerned in mining association rules from their databases. For example, the detection of interesting association relationships between large quantities of business transaction data can assist in catalog design, cross-marketing, lossleader analysis, and various business decision making processes. A typical example of association rule mining is market basket analysis. This method examines customer buying patterns by identifying associations among various items that customers place in their shopping baskets. The identification of such associations can assist retailers expand marketing strategies by gaining insight into which items are frequently purchased jointly by customers. It is helpful to examine the customer purchasing behavior and assists in increasing the sales and conserve inventory by focusing on the point of sale transaction data. This work acts as a broad area for the researchers to develop a better data mining algorithm. This paper presents a survey about the existing data mining algorithm for market basket analysis

    Obtaining optimal quality measures for quantitative association rules

    Get PDF
    There exist several works in the literature in which fitness functions based on a combination of weighted measures for the discovery of association rules have been proposed. Nevertheless, some differences in the measures used to assess the quality of association rules could be obtained according to the values of the weights of the measures included in the fitness function. Therefore, user's decision is very important in order to specify the weights of the measures involved in the optimization process. This paper presents a study of well-known quality measures with regard to the weights of the measures that appear in a fitness function. In particular, the fitness function of an existing evolutionary algorithm called QARGA has been considered with the purpose of suggesting the values that should be assigned to the weights, depending on the set of measures to be optimized. As initial step, several experiments have been carried out from 35 public datasets in order to show how the weights for confidence, support, amplitude and number of attributes measures included in the fitness function have an influence on different quality measures according to several minimum support thresholds. Second, statistical tests have been conducted for evaluating when the differences in measures of the rules obtained by QARGA are significative, and thus, to provide the best weights to be considered depending on the group of measures to be optimized. Finally, the results obtained when using the recommended weights for two real-world applications related to ozone and earthquakes are reported.Ministerio de Ciencia y Tecnología TIN2011-28956-C02Junta de Andalucía P12- TIC-1728Universidad Pablo de Olavide APPB81309

    Evolving temporal fuzzy association rules from quantitative data with a multi-objective evolutionary algorithm

    Get PDF
    A novel method for mining association rules that are both quantitative and temporal using a multi-objective evolutionary algorithm is presented. This method successfully identifies numerous temporal association rules that occur more frequently in areas of a dataset with specific quantitative values represented with fuzzy sets. The novelty of this research lies in exploring the composition of quantitative and temporal fuzzy association rules and the approach of using a hybridisation of a multi-objective evolutionary algorithm with fuzzy sets. Results show the ability of a multi-objective evolutionary algorithm (NSGA-II) to evolve multiple target itemsets that have been augmented into synthetic datasets

    A Sensitivity Analysis for Quality Measures of Quantitative Association Rules

    Get PDF
    There exist several fitness function proposals based on a combination of weighted objectives to optimize the discovery of association rules. Nevertheless, some differences in the measures used to assess the quality of association rules could be obtained according to the values of such weights. Therefore, in such proposals it is very important the user’s decision in order to specify the weights or coefficients of the optimized objectives. Thus, this work presents an analysis on the sensitivity of several quality measures when the weights included in the fitness function of the existing QARGA algorithm are modified. Finally, a comparative analysis of the results obtained according to the weights setup is provided.MICYT TIN2011-28956-C02-00Junta de Andalucía P11-TIC-752

    Analysis of Measures of Quantitative Association Rules

    Get PDF
    This paper presents the analysis of relationships among different interestingness measures of quality of association rules as first step to select the best objectives in order to develop a multi-objective algorithm. For this purpose, the discovering of association rules is based on evolutionary techniques. Specifically, a genetic algorithm has been used in order to mine quantitative association rules and determine the intervals on the attributes without discretizing the data before. The algorithm has been applied in real-word climatological datasets based on Ozone and Earthquake data.Ministerio de Ciencia y Tecnología TIN2007-68084-C-00Junta de Andalucía P07-TIC-0261

    Mining range associations for classification and characterization

    Get PDF
    In this paper, we propose a method that is able to derive rules involving range associations from numerical attributes, and to use such rules to build comprehensible classification and characterization (data summary) models. Our approach follows the classification association rule mining paradigm, where rules are generated in a way similar to association rule mining, but search is guided by rule consequents. This allows many credible rules, not just some dominant rules, to be mined from the data to build models. In so doing, we propose several sub-range analysis and rule formation heuristics to deal with numerical attributes. Our experiments show that our method is able to derive range-based rules that offer both accurate classification and comprehensible characterization for numerical data
    corecore