9 research outputs found

    Фільтрація результуючого набору асоціативних правил з точки зору оцінки цікавості

    Get PDF
    Пропонується метод фільтрації набору асоціативних правил, отриманих у результаті пошуку логічних залежностей. Кількість знайдених асоціативних правил за умови встановлених рівнів підтримки та довіри може бути досить великою й потребує скорочення. Метод дозволяє працювати з так званими "цікавими" правилами, які мають такі рівні підтримки та довіри, які значно відрізняються від очікуваних. Очікувані параметри розраховуються виходячи з припущення про незалежність ознак, що входять до лівої частини правила. Показано, як змінюються рівні підтримки та довіри "цікавих" асоціативних правил за умови залежності ознак в даних, які аналізуються

    Evaluation and optimization of frequent association rule based classification

    Get PDF
    Deriving useful and interesting rules from a data mining system is an essential and important task. Problems such as the discovery of random and coincidental patterns or patterns with no significant values, and the generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness of rules generated by data mining algorithms are actively and constantly being examined and developed. In this paper, a systematic way to evaluate the association rules discovered from frequent itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriated sequence of usage is presented. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided. Empirical results show that with a proper combination of data mining and statistical analysis, the framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy and coverage rules when used in the classification problem. Moreover, the results reveal the important characteristics of mining frequent itemsets, and the impact of confidence measure for the classification task

    Irrelevant feature and rule removal for structural associative classification

    Get PDF
    In the classification task, the presence of irrelevant features can significantly degrade the performance of classification algorithms,in terms of additional processing time, more complex models and the likelihood that the models have poor generalization power due to the over fitting problem.Practical applications of association rule mining often suffer from overwhelming number of rules that are generated, many of which are not interesting or not useful for the application in question.Removing rules comprised of irrelevant features can significantly improve the overall performance.In this paper, we explore and compare the use of a feature selection measure to filter out unnecessary and irrelevant features/attributes prior to association rules generation.The experiments are performed using a number of real-world datasets that represent diverse characteristics of data items.Empirical results confirm that by utilizing feature subset selection prior to association rule generation, a large number of rules with irrelevant features can be eliminated.More importantly, the results reveal that removing rules that hold irrelevant features improve the accuracy rate and capability to retain the rule coverage rate of structural associative association

    Pattern Set Mining with Schema-based Constraint

    Get PDF
    Pattern set mining entails discovering groups of frequent itemsets that represent potentially relevant knowledge. Global constraints are commonly enforced to focus the analysis on most interesting pattern sets. However, these constraints evaluate and select each pattern set individually based on its itemset characteristics. This paper extends traditional global constraints by proposing a novel constraint, called schema-based constraint, tailored to relational data. When coping with relational data itemsets consist of sets of items belonging to distinct data attributes, which constitute the itemset schema. The schema-based constraint allows us to effectively combine all the itemsets that are semantically correlated with each other into a unique pattern set, while filtering out those pattern sets covering a mixture of different data facets or giving a partial view of a single facet. Specifically, it selects all the pattern sets that are (i) composed only of frequent itemsets with the same schema and (ii) characterized by maximal size among those corresponding to that schema. Since existing approaches are unable to select one representative pattern set per schema in a single extraction, we propose a new Apriori-based algorithm to efficiently mine pattern sets satisfying the schema-based constraint. The experimental results achieved on both real and synthetic datasets demonstrate the efficiency and effectiveness of our approach

    Quality and interestingness of association rules derived from data mining of relational and semi-structured data

    Get PDF
    Deriving useful and interesting rules from a data mining system are essential and important tasks. Problems such as the discovery of random and coincidental patterns or patterns with no significant values, and the generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness of rules generated by data mining algorithms are actively and constantly being examined and developed. As the data mining techniques are data-driven, it is beneficial to affirm the rules using a statistical approach. It is important to establish the ways in which the existing statistical measures and constraint parameters can be effectively utilized and the sequence of their usage.In this thesis, a systematic way to evaluate the association rules discovered from frequent, closed and maximal itemset mining algorithms; and frequent subtree mining algorithm including the rules based on induced, embedded and disconnected subtrees is presented. With reference to the frequent subtree mining, in addition a new direction is explored based on utilizing the DSM approach capable of preserving all information from tree-structured database in a flat data format, consequently enabling the direct application of a wider range of data mining analysis/techniques to tree-structured data. Implications of this approach were investigated and it was found that basing rules on disconnected subtrees, can be useful in terms of increasing the accuracy and the coverage rate of the rule set.A strategy that combines data mining and statistical measurement techniques such as sampling, redundancy and contradictive checks, correlation and regression analysis to evaluate the rules is developed. This framework is then applied to real-world datasets that represent diverse characteristics of data/items. Empirical results show that with a proper combination of data mining and statistical analysis, the proposed framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy rules. Moreover, the results reveal the important characteristics and differences between mining frequent, closed or maximal itemsets; and mining frequent subtree including the rules based on induced, embedded and disconnected subtrees; as well as the impact of confidence measure for the prediction and classification task

    Interestingness measures for association rules based on statistical validity

    Get PDF
    Assessing rules with interestingness measures is the pillar of successful application of association rules discovery.However, association rules discovered are normally large in number, some of which are not considered as interesting or significant for the application at hand. In this paper, we present a systematic approach to ascertain the discovered rules, and provide a precise statistical approach supporting this framework.The proposed strategy combines data mining and statistical measurement techniques, including redundancy analysis, sampling and multivariate statistical analysis, to discard the non significant rules.Moreover, we consider real world datasets which are characterized by the uniform and non- uniform data/items distribution with a mixture of measurement levels throughout the data/items.The proposed unified framework is applied on these datasets to demonstrate its effectiveness in discarding many of the redundant or non-significant rules, while still preserving the high accuracy of the rule set as a whole

    Improvement of model of decision-making by system of association rules

    Get PDF
    Osnovni cilj istraživanja Doktorske disertacije je definisanje okvira za sprovođenje celovitog istraživačkog poduhvata unapređenja modela poslovnog odlučivanja i otkrivanja zakonitosti u podacima za potrebe brojnih analiza: pre svega otkrivanja asocijativnih pravila i predviđanja, kao i upotrebe rezultata radi donošenja ispravnih upravljačkih poslovnih odluka. Dakle, cilj je analiza i primena sistema asocijativnih pravila radi unapređenja modela poslovnog odlučivanja menadžera najvišeg nivoa poslovnog sistema, radi donošenja efektivnih i efikasnih odluka. U istraživanju su korišćene savremene naučne metode iz oblasti poslovne inteligencije. Glavna hipoteza: “Moguće je unaprediti model poslovnog odlučivanja sistemom asocijativnih pravila“ je potvrđena u istraživanju. Ukazano je na značaj poslovne inteligencije za stvaranje modela koji može povećati efektivnost procesa menadžerskog odlučivanja. Primena asocijativnih pravila u svrhe istraživanja ima izuzetan potencijal u oblasti poslovanja. Razvijen je i prikazan model poslovnog odlučivanja pomoću sistema asocijativnih pravila. Dokazano je da je ova oblast poslovne inteligencije veoma aktuelna i sa velikim potencijalom. Izvedeni zu zaključci i date su smernice za buduća istraživanja kao izazov da se pruže značajan naučni i stručni doprinos sa ciljem unapređenja poslovnog odlučivanja.Main goal of research of the Doctoral Dissertation is defining the framework for integral research project of improvement of model of decisionmaking and data mining for numerous analysis: mining of association rules and prediction, as well as the use of results in order to gain effective management decisions. The goal of the research is analysis and application of system of association rules in order to improve the model of business decision-making of top-level managers of business system, in order to get the most effective decisions. Modern scientific methods from the field of business intelligence have been used during the research. The main hypothesis: “It is possible to improve the model of business decicion-making by system of association rules” has been confirmed during the research. The importance of business intelligence for creation of model that can increase the effectiveness of managers’ decision making is highlighted. The application of association rules with the purposes of research has an immense potential in the business. The model of business decision-making by association rules has been developed and presented. It is proven that this field of business intelligence is very popular and has big potential. Concluding remarks, as well as the recommendations for future research have been given in order to provide significant scientific and professional contribution with goal of improving business decision- making
    corecore