6,247 research outputs found

    Mining Optimized Association Rules for Numeric Attributes

    Get PDF
    AbstractGiven a huge database, we address the problem of finding association rules for numeric attributes, such as(Balance∈I)⇒(CardLoan=yes),which implies that bank customers whose balances fall in a rangeIare likely to use card loan with a probability greater thanp. The above rule is interesting only if the rangeIhas some special feature with respect to the interrelation betweenBalanceandCardLoan. It is required that the number of customers whose balances are contained inI(called thesupportofI) is sufficient and also that the probabilitypof the conditionCardLoan=yesbeing met (called theconfidence ratio) be much higher than the average probability of the condition over all the data. Our goal is to realize a system that finds such appropriate ranges automatically. We mainly focus on computing twooptimized ranges: one that maximizes the support on the condition that the confidence ratio is at least a given threshold value, and another that maximizes the confidence ratio on the condition that the support is at least a given threshold number. Using techniques from computational geometry, we present novel algorithms that compute the optimized ranges in linear time if the data are sorted. Since sorting data with respect to each numeric attribute is expensive in the case of huge databases that occupy much more space than the main memory, we instead apply randomized bucketing as the preprocessing method and thus obtain an efficient rule-finding system. Tests show that our implementation is fast not only in theory but also in practice. The efficiency of our algorithm enables us to compute optimized rules for all combinations of hundreds of numeric and Boolean attributes in a reasonable time

    Mining fuzzy association rules in large databases with quantitative attributes.

    Get PDF
    by Kuok, Chan Man.Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.Includes bibliographical references (leaves 74-77).Abstract --- p.iAcknowledgments --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Data Mining --- p.2Chapter 1.2 --- Association Rule Mining --- p.3Chapter 2 --- Background --- p.6Chapter 2.1 --- Framework of Association Rule Mining --- p.6Chapter 2.1.1 --- Large Itemsets --- p.6Chapter 2.1.2 --- Association Rules --- p.8Chapter 2.2 --- Association Rule Algorithms For Binary Attributes --- p.11Chapter 2.2.1 --- AIS --- p.12Chapter 2.2.2 --- SETM --- p.13Chapter 2.2.3 --- "Apriori, AprioriTid and AprioriHybrid" --- p.15Chapter 2.2.4 --- PARTITION --- p.18Chapter 2.3 --- Association Rule Algorithms For Numeric Attributes --- p.20Chapter 2.3.1 --- Quantitative Association Rules --- p.20Chapter 2.3.2 --- Optimized Association Rules --- p.23Chapter 3 --- Problem Definition --- p.25Chapter 3.1 --- Handling Quantitative Attributes --- p.25Chapter 3.1.1 --- Discrete intervals --- p.26Chapter 3.1.2 --- Overlapped intervals --- p.27Chapter 3.1.3 --- Fuzzy sets --- p.28Chapter 3.2 --- Fuzzy association rule --- p.31Chapter 3.3 --- Significance factor --- p.32Chapter 3.4 --- Certainty factor --- p.36Chapter 3.4.1 --- Using significance --- p.37Chapter 3.4.2 --- Using correlation --- p.38Chapter 3.4.3 --- Significance vs. Correlation --- p.42Chapter 4 --- Steps For Mining Fuzzy Association Rules --- p.43Chapter 4.1 --- Candidate itemsets generation --- p.44Chapter 4.1.1 --- Candidate 1-Itemsets --- p.45Chapter 4.1.2 --- Candidate k-Itemsets (k > 1) --- p.47Chapter 4.2 --- Large itemsets generation --- p.48Chapter 4.3 --- Fuzzy association rules generation --- p.49Chapter 5 --- Experimental Results --- p.51Chapter 5.1 --- Experiment One --- p.51Chapter 5.2 --- Experiment Two --- p.53Chapter 5.3 --- Experiment Three --- p.54Chapter 5.4 --- Experiment Four --- p.56Chapter 5.5 --- Experiment Five --- p.58Chapter 5.5.1 --- Number of Itemsets --- p.58Chapter 5.5.2 --- Number of Rules --- p.60Chapter 5.6 --- Experiment Six --- p.61Chapter 5.6.1 --- Varying Significance Threshold --- p.62Chapter 5.6.2 --- Varying Membership Threshold --- p.62Chapter 5.6.3 --- Varying Confidence Threshold --- p.63Chapter 6 --- Discussions --- p.65Chapter 6.1 --- User guidance --- p.65Chapter 6.2 --- Rule understanding --- p.67Chapter 6.3 --- Number of rules --- p.68Chapter 7 --- Conclusions and Future Works --- p.70Bibliography --- p.7

    QCBA: Postoptimization of Quantitative Attributes in Classifiers based on Association Rules

    Full text link
    The need to prediscretize numeric attributes before they can be used in association rule learning is a source of inefficiencies in the resulting classifier. This paper describes several new rule tuning steps aiming to recover information lost in the discretization of numeric (quantitative) attributes, and a new rule pruning strategy, which further reduces the size of the classification models. We demonstrate the effectiveness of the proposed methods on postoptimization of models generated by three state-of-the-art association rule classification algorithms: Classification based on Associations (Liu, 1998), Interpretable Decision Sets (Lakkaraju et al, 2016), and Scalable Bayesian Rule Lists (Yang, 2017). Benchmarks on 22 datasets from the UCI repository show that the postoptimized models are consistently smaller -- typically by about 50% -- and have better classification performance on most datasets

    Interpretations of Association Rules by Granular Computing

    Get PDF
    We present interpretations for association rules. We first introduce Pawlak's method, and the corresponding algorithm of finding decision rules (a kind of association rules). We then use extended random sets to present a new algorithm of finding interesting rules. We prove that the new algorithm is faster than Pawlak's algorithm. The extended random sets are easily to include more than one criterion for determining interesting rules. We also provide two measures for dealing with uncertainties in association rules

    RESEARCH ISSUES CONCERNING ALGORITHMS USED FOR OPTIMIZING THE DATA MINING PROCESS

    Get PDF
    In this paper, we depict some of the most widely used data mining algorithms that have an overwhelming utility and influence in the research community. A data mining algorithm can be regarded as a tool that creates a data mining model. After analyzing a set of data, an algorithm searches for specific trends and patterns, then defines the parameters of the mining model based on the results of this analysis. The above defined parameters play a significant role in identifying and extracting actionable patterns and detailed statistics. The most important algorithms within this research refer to topics like clustering, classification, association analysis, statistical learning, link mining. In the following, after a brief description of each algorithm, we analyze its application potential and research issues concerning the optimization of the data mining process. After the presentation of the data mining algorithms, we will depict the most important data mining algorithms included in Microsoft and Oracle software products, useful suggestions and criteria in choosing the most recommended algorithm for solving a mentioned task, advantages offered by these software products.data mining optimization, data mining algorithms, software solutions

    Evolving temporal fuzzy association rules from quantitative data with a multi-objective evolutionary algorithm

    Get PDF
    A novel method for mining association rules that are both quantitative and temporal using a multi-objective evolutionary algorithm is presented. This method successfully identifies numerous temporal association rules that occur more frequently in areas of a dataset with specific quantitative values represented with fuzzy sets. The novelty of this research lies in exploring the composition of quantitative and temporal fuzzy association rules and the approach of using a hybridisation of a multi-objective evolutionary algorithm with fuzzy sets. Results show the ability of a multi-objective evolutionary algorithm (NSGA-II) to evolve multiple target itemsets that have been augmented into synthetic datasets

    A Survey on Data Mining Algorithm for Market Basket Analysis

    Get PDF
    Association rule mining identifies the remarkable association or relationship between a large set of data items. With huge quantity of data constantly being obtained and stored in databases, several industries are becoming concerned in mining association rules from their databases. For example, the detection of interesting association relationships between large quantities of business transaction data can assist in catalog design, cross-marketing, lossleader analysis, and various business decision making processes. A typical example of association rule mining is market basket analysis. This method examines customer buying patterns by identifying associations among various items that customers place in their shopping baskets. The identification of such associations can assist retailers expand marketing strategies by gaining insight into which items are frequently purchased jointly by customers. It is helpful to examine the customer purchasing behavior and assists in increasing the sales and conserve inventory by focusing on the point of sale transaction data. This work acts as a broad area for the researchers to develop a better data mining algorithm. This paper presents a survey about the existing data mining algorithm for market basket analysis

    A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

    Get PDF
    Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data
    corecore