80,097 research outputs found
A Classification Rules Mining Method based on Dynamic Rules' Frequency
Rule based classification or rule induction (RI) in data mining is an approach that normally generates classifiers containing simple yet effective rules. Most RI algorithms suffer from few drawbacks mainly related to rule pruning and rules sharing training data instances. In response to the above two issues, a new dynamic rule induction (DRI) method is proposed that utilises two thresholds to minimise the items search space. Whenever a rule is generated, DRI algorithm ensures that all candidate items' frequencies are updated to reflect the deletion of the rule’s training data instances. Therefore, the remaining candidate items waiting to be added to other rules have dynamic frequencies rather static. This enables DRI to generate not only rules with 100% accuracy but rules with high accuracy as well. Experimental tests using a number of UCI data sets have been conducted using a number of RI algorithms. The results clearly show competitive performance in regards to classification accuracy and classifier size of DRI when compared to other RI algorithms
The Ideal Candidate. Analysis of Professional Competences through Text Mining of Job Offers
The aim of this paper is to propose analytical tools for identifying peculiar aspects of job market for graduates. We propose a strategy for dealing with daa tat have different source and nature
Data mining: a tool for detecting cyclical disturbances in supply networks.
Disturbances in supply chains may be either exogenous or endogenous. The ability automatically to detect, diagnose, and distinguish between the causes of disturbances is of prime importance to decision makers in order to avoid uncertainty. The spectral principal component analysis (SPCA) technique has been utilized to distinguish between real and rogue disturbances in a steel supply network. The data set used was collected from four different business units in the network and consists of 43 variables; each is described by 72 data points. The present paper will utilize the same data set to test an alternative approach to SPCA in detecting the disturbances. The new approach employs statistical data pre-processing, clustering, and classification learning techniques to analyse the supply network data. In particular, the incremental k-means
clustering and the RULES-6 classification rule-learning algorithms, developed by the present authors’ team, have been applied to identify important patterns in the data set. Results show that the proposed approach has the capability automatically to detect and characterize network-wide cyclical disturbances and generate hypotheses about their root cause
A review of associative classification mining
Associative classification mining is a promising approach in data mining that utilizes the
association rule discovery techniques to construct classification systems, also known as
associative classifiers. In the last few years, a number of associative classification algorithms
have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. These algorithms
employ several different rule discovery, rule ranking, rule pruning, rule prediction and rule
evaluation methods. This paper focuses on surveying and comparing the state-of-the-art associative
classification techniques with regards to the above criteria. Finally, future directions in associative
classification, such as incremental learning and mining low-quality data sets, are also
highlighted in this paper
Controlling False Positives in Association Rule Mining
Association rule mining is an important problem in the data mining area. It
enumerates and tests a large number of rules on a dataset and outputs rules
that satisfy user-specified constraints. Due to the large number of rules being
tested, rules that do not represent real systematic effect in the data can
satisfy the given constraints purely by random chance. Hence association rule
mining often suffers from a high risk of false positive errors. There is a lack
of comprehensive study on controlling false positives in association rule
mining. In this paper, we adopt three multiple testing correction
approaches---the direct adjustment approach, the permutation-based approach and
the holdout approach---to control false positives in association rule mining,
and conduct extensive experiments to study their performance. Our results show
that (1) Numerous spurious rules are generated if no correction is made. (2)
The three approaches can control false positives effectively. Among the three
approaches, the permutation-based approach has the highest power of detecting
real association rules, but it is very computationally expensive. We employ
several techniques to reduce its cost effectively.Comment: VLDB201
Text Classification Using Association Rules, Dependency Pruning and Hyperonymization
We present new methods for pruning and enhancing item- sets for text
classification via association rule mining. Pruning methods are based on
dependency syntax and enhancing methods are based on replacing words by their
hyperonyms of various orders. We discuss the impact of these methods, compared
to pruning based on tfidf rank of words.Comment: 16 pages, 2 figures, presented at DMNLP 201
- …