490,304 research outputs found
Random Relational Rules
In the field of machine learning, methods for learning from single-table data have received much more attention than those for learning from multi-table, or relational data, which are generally more computationally complex. However, a significant amount of the world's data is relational. This indicates a need for algorithms that can operate efficiently on relational data and exploit the larger body of work produced in the area of single-table techniques.
This thesis presents algorithms for learning from relational data that mitigate, to some extent, the complexity normally associated with such learning. All algorithms in this thesis are based on the generation of random relational rules. The assumption is that random rules enable efficient and effective relational learning, and this thesis presents evidence that this is indeed the case. To this end, a system for generating random relational rules is described, and algorithms using these rules are evaluated. These algorithms include direct classification, classification by propositionalisation, clustering, semi-supervised learning and generating random forests.
The experimental results show that these algorithms perform competitively with previously published results for the datasets used, while often exhibiting lower runtime than other tested systems. This demonstrates that sufficient information for classification and clustering is retained in the rule generation process and that learning with random rules is efficient.
Further applications of random rules are investigated. Propositionalisation allows single-table algorithms for classification and clustering to be applied to the resulting data, reducing the amount of relational processing required. Further results show that techniques for utilising additional unlabeled training data improve accuracy of classification in the semi-supervised setting. The thesis also develops a novel algorithm for building random forests by makingefficient use of random rules to generate trees and leaves in parallel
Association Rule Based Classification
In this thesis, we focused on the construction of classification models based on association rules. Although association rules have been predominantly used for data exploration and description, the interest in using them for prediction has rapidly increased in the data mining community. In order to mine only rules that can be used for classification, we modified the well known association rule mining algorithm Apriori to handle user-defined input constraints. We considered constraints that require the presence/absence of particular items, or that limit the number of items, in the antecedents and/or the consequents of the rules. We developed a characterization of those itemsets that will potentially form rules that satisfy the given constraints. This characterization allows us to prune during itemset construction itemsets such that neither they nor any of their supersets will form valid rules. This improves the time performance of itemset construction. Using this characterization, we implemented a classification system based on association rules and compared the performance of several model construction methods, including CBA, and several model deployment modes to make predictions. Although the data mining community has dealt only with the classification of single-valued attributes, there are several domains in which the classification target is set-valued. Hence, we enhanced our classification system with a novel approach to handle the prediction of set-valued class attributes. Since the traditional classification accuracy measure is inappropriate in this context, we developed an evaluation method for set-valued classification based on the E-Measure. Furthermore, we enhanced our algorithm by not relying on the typical support/confidence framework, and instead mining for the best possible rules above a user-defined minimum confidence and within a desired range for the number of rules. This avoids long mining times that might produce large collections of rules with low predictive power. For this purpose, we developed a heuristic function to determine an initial minimum support and then adjusted it using a binary search strategy until a number of rules within the given range was obtained. We implemented all of our techniques described above in WEKA, an open source suite of machine learning algorithms. We used several datasets from the UCI Machine Learning Repository to test and evaluate our techniques
Random rules from data streams
Existing works suggest that random inputs and random features produce good results in classification. In this paper we study the problem of generating random rule sets from data streams. One of the most interpretable and flexible models for data stream mining prediction tasks is the Very Fast Decision Rules learner (VFDR). In this work we extend the VFDR algorithm using random rules from data streams. The proposed algorithm generates several sets of rules. Each rule set is associated with a set of Natt attributes. The proposed algorithm maintains all properties required when learning from stationary data streams: online and any-time classification, processing each example once. Copyright 2013 ACM
HYEI: A New Hybrid Evolutionary Imperialist Competitive Algorithm for Fuzzy Knowledge Discovery
In recent years, imperialist competitive algorithm (ICA), genetic algorithm (GA), and hybrid fuzzy classification systems have been successfully and effectively employed for classification tasks of data mining. Due to overcoming the gaps related to ineffectiveness of current algorithms for analysing high-dimension independent datasets, a new hybrid approach, named HYEI, is presented to discover generic rule-based systems in this paper. This proposed approach consists of three stages and combines an evolutionary-based fuzzy system with two ICA procedures to generate high-quality fuzzy-classification rules. Initially, the best feature subset is selected by using the embedded ICA feature selection, and then these features are used to generate basic fuzzy-classification rules. Finally, all rules are optimized by using an ICA algorithm to reduce their length or to eliminate some of them. The performance of HYEI has been evaluated by using several benchmark datasets from the UCI machine learning repository. The classification accuracy attained by the proposed algorithm has the highest classification accuracy in 6 out of the 7 dataset problems and is comparative to the classification accuracy of the 5 other test problems, as compared to the best results previously published
A hierarchically combined classifier for license plate recognition
High accuracy and fast recognition speed are two requirements for real-time and automatic license plate recognition system. In this paper, we propose a hierarchically combined classifier based on an Inductive Learning Based Method and an SVM-based classification. This approach employs the inductive learning based method to roughly divide all classes into smaller groups. Then the SVM method is used for character classification in individual groups. Both start from a collection of samples of characters from license plates. After a training process using some known samples in advance, the inductive learning rules are extracted for rough classification and the parameters used for SVM-based classification are obtained. Then, a classification tree is constructed for further fast training and testing processes for SVMbased classification. Experimental results for the proposed approach are given. From the experimental results, we can make the conclusion that the hierarchically combined classifier is better than either the inductive learning based classification or the SVMbased classification in terms of error rates and processing speeds. © 2008 IEEE
Learning Interestingness of Streaming Classification Rules
Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rules are induced. Therefore, these rules stream through time and are so called streaming classification rules. In this paper, an interactive rule interestingness-learning algorithm (IRIL) is developed to automatically label the classification rules either as "interesting" or "uninteresting" with limited user interaction. In our study, VFP (Voting Feature Projections), a feature projection based incremental classification learning algorithm, is also developed in the framework of IRIL. The concept description learned by the VFP algorithm constitutes a novel approach for interestingness analysis of streaming classification rules. © Springer-Verlag 2004
Modified anfis architecture with less computational complexities for classification problems
Adaptive Neuro Fuzzy Inference System (ANFIS) is one of those soft computing techniques that have solved the problems effectively in a wide variety of real-world applications. Even though it has been widely used, ANFIS architecture still has a drawback of computational complexities. The number of rules and its tunable parameters increase exponentially which created the problem of curse of dimensionality. Moreover, the standard architecture has a key drawback because of using grid partitioning and combination of gradient descent (GD) and least square estimation (LSE) which have problem to be likely trapped in local minima. Even though grid partitioning method is very useful to generate better accuracy for ANFIS model, since it generates maximum number of rules by considering all possibilities, but it also increases computational complexity. Since, ANFIS use fuzzy logic, the model accuracy is highly dependent on selecting the appropriate type of membership function. Furthermore, researchers have mainly used metaheuristic algorithms to avoid the problem of local minima in standard learning method. In this study, the experiments have been made to find out best suitable membership function for ANFIS model. Additionally, ANFIS architecture is modified for lessening computational complexities of the ANFIS architecture by reducing the fourth layer and reducing the trainable parameters as well. The proposed ANFIS model is trained by one of the metaheuristics approach instead of standard two pass learning algorithm. The performance of proposed modified ANFIS architecture is validated with the standard ANFIS architecture for solving classification problems. The results show that the proposed modified ANFIS architecture with gaussian membership function and Artificial Bee Colony (ABC) optimization algorithm, on average has achieved classification accuracy of 99.5% with 83% less computational complexity
Modeling interestingness of streaming classification rules as a classification problem
Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rules are induced. Therefore, these rules stream through time and are so called streaming classification rules. In this paper, an interactive classification rules' interestingness learning algorithm (ICRIL) is developed to automatically label the classification rules either as "interesting" or "uninteresting" with limited user interaction. In our study, VFFP (Voting Fuzzified Feature Projections), a feature projection based incremental classification algorithm, is also developed in the framework of ICRIL. The concept description learned by the VFFP is the interestingness concept of streaming classification rules. © Springer-Verlag Berlin Heidelberg 2006
Propositional Satisfiability Method in Rough Classification Modeling for Data Mining
The fundamental problem in data mining is whether the whole information available is
always necessary to represent the information system (IS). The goal of data mining is to
find rules that model the world sufficiently well. These rules consist of conditions over
attributes value pairs called description and classification of decision attribute. However,
the set of all decision rules generated from all conditional attributes can be too large and
can contain many chaotic rules that are not appropriate for unseen object classification.
Therefore the search for the best rules must be performed because it is not possible to
determine the quality of all rules generated from the information systems. In rough set
approach to data mining, the set of interesting rules are determined using a notion of reduct. Rules were generated from reducts through binding the condition attribute values
of the object class from which the reduct is originated to the corresponding attribute. It is
important for the reducts to be minimum in size. The minimal reducts will decrease the
size of the conditional attributes used to generate rules. Smaller size of rules are
expected to classify new cases more properly because of the larger support in data and in
some sense the most stable and frequently appearing reducts gives the best decision
rules.
The main work of the thesis is the generation of classification model that contains
smaller number of rules, shorter length and good accuracy. The propositional
satisfiability method in rough classification model is proposed in this thesis. Two
models, Standard Integer Programming (SIP) and Decision Related Integer
Programming (DRIP) to represent the minimal reduct computation problem were
proposed. The models involved a theoretical formalism of the discemibility relation of a
decision system (DS) into an Integer Programming (IP) model. The proposed models
were embedded within the default rules generation framework and a new rough
classification method was obtained. An improved branch and bound strategy is proposed
to solve the SIP and DRIP models that pruned certain amount of search. The proposed
strategy used the conflict analysis procedure to remove the unnecessary attribute
assignments and determined the branch level for the search to backtrack in a nonchronological
manner.
Five data sets from VCI machine learning repositories and domain theories were
experimented. Total number rules generated for the best classification model is recorded where the 30% of data were used for training and 70% were kept as test data. The
classification accuracy, the number of rules and the maximum length of rules obtained
from the SIPIDRIP method was compared with other rough set method such as Genetic
Algorithm (GA), Johnson, Holte l R, Dynamic and Exhaustive method. Four of the
datasets were then chosen for further experiment. The improved search strategy
implemented the non-chronological backtracking search that potentially prunes the large
portion of search space. The experimental results showed that the proposed SIPIDRIP
method is a successful method in rough classification modeling. The outstanding feature
of this method is the reduced number of rules in all classification models. SIPIDRIP
generated shorter rules among other methods in most dataset. The proposed search
strategy indicated that the best performance can be achieved at the lower level or shorter
path of the tree search. SIPIDRIP method had also shown promising across other
commonly used classifiers such as neural network and statistical method. This model is
expected to be able to represent the knowledge of the system efficiently
- …