9 research outputs found
Фільтрація результуючого набору асоціативних правил з точки зору оцінки цікавості
Пропонується метод фільтрації набору асоціативних правил, отриманих у результаті пошуку логічних залежностей. Кількість знайдених асоціативних правил за умови встановлених рівнів підтримки та довіри може бути досить великою й потребує скорочення. Метод дозволяє працювати з так званими "цікавими" правилами, які мають такі рівні підтримки та довіри, які значно відрізняються від очікуваних. Очікувані параметри розраховуються виходячи з припущення про незалежність ознак, що входять до лівої частини правила. Показано, як змінюються рівні підтримки та довіри "цікавих" асоціативних правил за умови залежності ознак в даних, які аналізуються
Evaluation and optimization of frequent association rule based classification
Deriving useful and interesting rules from a data mining system is an essential and important task. Problems
such as the discovery of random and coincidental patterns or patterns with no significant values, and the
generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness
of rules generated by data mining algorithms are actively and constantly being examined and developed. In this
paper, a systematic way to evaluate the association rules discovered from frequent itemset mining algorithms,
combining common data mining and statistical interestingness measures, and outline an appropriated sequence of usage is presented. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided. Empirical results show that with a proper combination of data mining and statistical analysis, the framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy and coverage rules when used in the classification problem. Moreover, the results reveal the important characteristics of mining frequent itemsets, and the impact of confidence measure for the classification task
Irrelevant feature and rule removal for structural associative classification
In the classification task, the presence of irrelevant features can significantly degrade the performance of classification algorithms,in terms of additional processing time, more complex models
and the likelihood that the models have poor generalization power due to the over fitting problem.Practical applications of association rule mining often suffer from overwhelming number of rules that are generated, many of which are not interesting or not useful for the application in question.Removing rules comprised of irrelevant features can significantly improve the overall performance.In this paper, we explore and compare
the use of a feature selection measure to filter out unnecessary and irrelevant features/attributes prior to association rules generation.The experiments are performed using a number
of real-world datasets that represent diverse characteristics of data items.Empirical results confirm that by utilizing feature subset selection prior to association rule generation, a large
number of rules with irrelevant features can be eliminated.More importantly, the results reveal that removing rules that hold irrelevant features improve the accuracy rate and capability to retain the rule coverage rate of structural associative association
Pattern Set Mining with Schema-based Constraint
Pattern set mining entails discovering groups of frequent itemsets that represent potentially relevant knowledge. Global constraints are commonly enforced to focus the analysis on most interesting pattern sets. However, these constraints evaluate and select each pattern set individually based on its itemset characteristics.
This paper extends traditional global constraints by proposing a novel constraint, called schema-based constraint, tailored to relational data. When coping with relational data itemsets consist of sets of items belonging to distinct data attributes, which constitute the itemset schema. The schema-based constraint allows us to effectively combine all the itemsets that are semantically correlated with each other into a unique pattern set, while filtering out those pattern sets covering a mixture of different data facets or giving a partial view of a single facet. Specifically, it selects all the pattern sets that are (i) composed only of frequent itemsets with the same schema and (ii) characterized by maximal size among those corresponding to that schema. Since existing approaches are unable to select one representative pattern set per schema in a single extraction, we propose a new Apriori-based algorithm to efficiently mine pattern sets satisfying the schema-based constraint. The experimental results achieved on both real and synthetic datasets demonstrate the efficiency and effectiveness of our approach
Quality and interestingness of association rules derived from data mining of relational and semi-structured data
Deriving useful and interesting rules from a data mining system are essential and important tasks. Problems such as the discovery of random and coincidental patterns or patterns with no significant values, and the generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness of rules generated by data mining algorithms are actively and constantly being examined and developed. As the data mining techniques are data-driven, it is beneficial to affirm the rules using a statistical approach. It is important to establish the ways in which the existing statistical measures and constraint parameters can be effectively utilized and the sequence of their usage.In this thesis, a systematic way to evaluate the association rules discovered from frequent, closed and maximal itemset mining algorithms; and frequent subtree mining algorithm including the rules based on induced, embedded and disconnected subtrees is presented. With reference to the frequent subtree mining, in addition a new direction is explored based on utilizing the DSM approach capable of preserving all information from tree-structured database in a flat data format, consequently enabling the direct application of a wider range of data mining analysis/techniques to tree-structured data. Implications of this approach were investigated and it was found that basing rules on disconnected subtrees, can be useful in terms of increasing the accuracy and the coverage rate of the rule set.A strategy that combines data mining and statistical measurement techniques such as sampling, redundancy and contradictive checks, correlation and regression analysis to evaluate the rules is developed. This framework is then applied to real-world datasets that represent diverse characteristics of data/items. Empirical results show that with a proper combination of data mining and statistical analysis, the proposed framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy rules. Moreover, the results reveal the important characteristics and differences between mining frequent, closed or maximal itemsets; and mining frequent subtree including the rules based on induced, embedded and disconnected subtrees; as well as the impact of confidence measure for the prediction and classification task
Interestingness measures for association rules based on statistical validity
Assessing rules with interestingness measures is the pillar of successful application of association rules discovery.However, association rules discovered are normally large in number, some of which are not
considered as interesting or significant for the application at hand. In this paper, we present a systematic approach to ascertain the discovered rules, and provide a precise statistical approach supporting this
framework.The proposed strategy combines data mining and statistical measurement techniques,
including redundancy analysis, sampling and multivariate statistical analysis, to discard the non significant rules.Moreover, we consider real world datasets which are characterized by the uniform and non-
uniform data/items distribution with a mixture of measurement levels throughout the data/items.The proposed unified framework is applied on these datasets to demonstrate its effectiveness in discarding many of the redundant or non-significant rules, while still preserving the high accuracy of the rule set as a whole
Improvement of model of decision-making by system of association rules
Osnovni cilj istraživanja Doktorske disertacije je definisanje okvira za
sprovođenje celovitog istraživačkog poduhvata unapređenja modela poslovnog
odlučivanja i otkrivanja zakonitosti u podacima za potrebe brojnih analiza: pre
svega otkrivanja asocijativnih pravila i predviđanja, kao i upotrebe rezultata radi
donošenja ispravnih upravljačkih poslovnih odluka. Dakle, cilj je analiza i
primena sistema asocijativnih pravila radi unapređenja modela poslovnog
odlučivanja menadžera najvišeg nivoa poslovnog sistema, radi donošenja
efektivnih i efikasnih odluka.
U istraživanju su korišćene savremene naučne metode iz oblasti poslovne
inteligencije. Glavna hipoteza: “Moguće je unaprediti model poslovnog
odlučivanja sistemom asocijativnih pravila“ je potvrđena u istraživanju.
Ukazano je na značaj poslovne inteligencije za stvaranje modela koji može
povećati efektivnost procesa menadžerskog odlučivanja. Primena asocijativnih
pravila u svrhe istraživanja ima izuzetan potencijal u oblasti poslovanja.
Razvijen je i prikazan model poslovnog odlučivanja pomoću sistema
asocijativnih pravila. Dokazano je da je ova oblast poslovne inteligencije veoma
aktuelna i sa velikim potencijalom. Izvedeni zu zaključci i date su smernice za
buduća istraživanja kao izazov da se pruže značajan naučni i stručni doprinos sa
ciljem unapređenja poslovnog odlučivanja.Main goal of research of the Doctoral Dissertation is defining the
framework for integral research project of improvement of model of decisionmaking
and data mining for numerous analysis: mining of association rules and
prediction, as well as the use of results in order to gain effective management
decisions. The goal of the research is analysis and application of system of
association rules in order to improve the model of business decision-making of
top-level managers of business system, in order to get the most effective
decisions.
Modern scientific methods from the field of business intelligence have been
used during the research. The main hypothesis: “It is possible to improve the
model of business decicion-making by system of association rules” has been
confirmed during the research. The importance of business intelligence for
creation of model that can increase the effectiveness of managers’ decision
making is highlighted. The application of association rules with the purposes of
research has an immense potential in the business.
The model of business decision-making by association rules has been developed
and presented. It is proven that this field of business intelligence is very popular
and has big potential. Concluding remarks, as well as the recommendations for
future research have been given in order to provide significant scientific and
professional contribution with goal of improving business decision- making