23,741 research outputs found
Stochastic Attribute-Value Grammars
Probabilistic analogues of regular and context-free grammars are well-known
in computational linguistics, and currently the subject of intensive research.
To date, however, no satisfactory probabilistic analogue of attribute-value
grammars has been proposed: previous attempts have failed to define a correct
parameter-estimation algorithm.
In the present paper, I define stochastic attribute-value grammars and give a
correct algorithm for estimating their parameters. The estimation algorithm is
adapted from Della Pietra, Della Pietra, and Lafferty (1995). To estimate model
parameters, it is necessary to compute the expectations of certain functions
under random fields. In the application discussed by Della Pietra, Della
Pietra, and Lafferty (representing English orthographic constraints), Gibbs
sampling can be used to estimate the needed expectations. The fact that
attribute-value grammars generate constrained languages makes Gibbs sampling
inapplicable, but I show how a variant of Gibbs sampling, the
Metropolis-Hastings algorithm, can be used instead.Comment: 23 pages, 21 Postscript figures, uses rotate.st
A survey of cost-sensitive decision tree induction algorithms
The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field
A System for Induction of Oblique Decision Trees
This article describes a new system for induction of oblique decision trees.
This system, OC1, combines deterministic hill-climbing with two forms of
randomization to find a good oblique split (in the form of a hyperplane) at
each node of a decision tree. Oblique decision tree methods are tuned
especially for domains in which the attributes are numeric, although they can
be adapted to symbolic or mixed symbolic/numeric attributes. We present
extensive empirical studies, using both real and artificial data, that analyze
OC1's ability to construct oblique trees that are smaller and more accurate
than their axis-parallel counterparts. We also examine the benefits of
randomization for the construction of oblique decision trees.Comment: See http://www.jair.org/ for an online appendix and other files
accompanying this articl
Recommended from our members
Computationally efficient induction of classification rules with the PMCRI and J-PMCRI frameworks
In order to gain knowledge from large databases, scalable data mining technologies are needed. Data are captured on a large scale and thus databases are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classiïŹcation rule induction, parallelisation of classiïŹcation rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classiïŹcation rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classiïŹcation rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach.are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classiïŹcation rule induction, parallelisation of classiïŹcation rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classiïŹcation rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classiïŹcation rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach
Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art
Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover
Models of incremental concept formation
Given a set of observations, humans acquire concepts that organize those observations and use them in classifying future experiences. This type of concept formation can occur in the absence of a tutor and it can take place despite irrelevant and incomplete information. A reasonable model of such human concept learning should be both incremental and capable of handling this type of complex experiences that people encounter in the real world. In this paper, we review three previous models of incremental concept formation and then present CLASSIT, a model that extends these earlier systems. All of the models integrate the process of recognition and learning, and all can be viewed as carrying out search through the space of possible concept hierarchies. In an attempt to show that CLASSIT is a robust concept formation system, we also present some empirical studies of its behavior under a variety of conditions
Recommended from our members
Knowledge aquisition for expert systems: inducing modular rules from examples
Knowledge acquisition for expert systems is notoriously difficult, often demanding an enormous effort on the part of the domain expert, who is essentially expected to spell out everything he knows about the domain. The task is non-trivial and can be time-consuming and tedious. Machine learning research, particularly into automatic rule induction from examples, may provide a way of easing this burden.
Arguably, the most popular and successful rule induction algorithm in general use today is Quinlan's ID3. ID3 induces rules in the form of decision trees. However, the research reported in this thesis identifies some major limitations of a decision tree representation. Decision trees can be incomprehensible, but more importantly, there are rules which cannot be represented by trees. Ideally, induced rules should be modular and should capture the essence of causality, avoiding irrelevance and redundancy.
The information theoretic approach employed in ID3 is examined in detail and some of its weaknesses identified. A new algorithm is developed which, by avoiding these weaknesses, induces rules which are modular rather than decision trees. This algorithm forms the basis of a new rule induction program, PRISM.
Given an ideal training set, PRISM induces a complete and correct set of maximally general rules. The program and its results are described using training sets from two domains, contact lens fitting and a chess endgame. Induction from incomplete training sets is discussed and the performance of PRISM is compared with that of ID3 with particular reference to predictive power.
A series of experiments is described, in which PRISM and ID3 were applied to training sets of different sizes and predictive power calculated. The results show that PRISM generally performs better than ID3 in these two domains, inducing fewer, more general rules, which classify a similar number of instances correctly and significantly fewer incorrectly
Dynamic Rule Covering Classification in Data Mining with Cyber Security Phishing Application
Data mining is the process of discovering useful patterns from datasets using intelligent techniques to help users make certain decisions. A typical data mining task is classification, which involves predicting a target variable known as the class in previously unseen data based on models learnt from an input dataset. Covering is a well-known classification approach that derives models with If-Then rules. Covering methods, such as PRISM, have a competitive predictive performance to other classical classification techniques such as greedy, decision tree and associative classification. Therefore, Covering models are appropriate decision-making tools and users favour them carrying out decisions.
Despite the use of Covering approach in data processing for different classification applications, it is also acknowledged that this approach suffers from the noticeable drawback of inducing massive numbers of rules making the resulting model large and unmanageable by users. This issue is attributed to the way Covering techniques induce the rules as they keep adding items to the ruleâs body, despite the limited data coverage (number of training instances that the rule classifies), until the rule becomes with zero error. This excessive learning overfits the training dataset and also limits the applicability of Covering models in decision making, because managers normally prefer a summarised set of knowledge that they are able to control and comprehend rather a high maintenance models. In practice, there should be a trade-off between the number of rules offered by a classification model and its predictive performance. Another issue associated with the Covering models is the overlapping of training data among the rules, which happens when a ruleâs classified data are discarded during the rule discovery phase. Unfortunately, the impact of a ruleâs removed data on other potential rules is not considered by this approach. However, When removing training data linked with a rule, both frequency and rank of other rulesâ items which have appeared in the removed data are updated. The impacted rules should maintain their true rank and frequency in a dynamic manner during the rule discovery phase rather just keeping the initial computed frequency from the original input dataset.
In response to the aforementioned issues, a new dynamic learning technique based on Covering and rule induction, that we call Enhanced Dynamic Rule Induction (eDRI), is developed. eDRI has been implemented in Java and it has been embedded in WEKA machine learning tool. The developed algorithm incrementally discovers the rules using primarily frequency and rule strength thresholds. These thresholds in practice limit the search space for both items as well as potential rules by discarding any with insufficient data representation as early as possible resulting in an efficient training phase. More importantly, eDRI substantially cuts down the number of training examples scans by continuously updating potential rulesâ frequency and strength parameters in a dynamic manner whenever a rule gets inserted into the classifier. In particular, and for each derived rule, eDRI adjusts on the fly the remaining potential rulesâ items frequencies as well as ranks specifically for those that appeared within the deleted training instances of the derived rule. This gives a more realistic model with minimal rules redundancy, and makes the process of rule induction efficient and dynamic and not static. Moreover, the proposed technique minimises the classifierâs number of rules at preliminary stages by stopping learning when any rule does not meet the ruleâs strength threshold therefore minimising overfitting and ensuring a manageable classifier. Lastly, eDRI prediction procedure not only priorities using the best ranked rule for class forecasting of test data but also restricts the use of the default class rule thus reduces the number of misclassifications.
The aforementioned improvements guarantee classification models with smaller size that do not overfit the training dataset, while maintaining their predictive performance. The eDRI derived models particularly benefit greatly users taking key business decisions since they can provide a rich knowledge base to support their decision making. This is because these modelsâ predictive accuracies are high, easy to understand, and controllable as well as robust, i.e. flexible to be amended without drastic change. eDRI applicability has been evaluated on the hard problem of phishing detection. Phishing normally involves creating a fake well-designed website that has identical similarity to an existing business trustful website aiming to trick users and illegally obtain their credentials such as login information in order to access their financial assets. The experimental results against large phishing datasets revealed that eDRI is highly useful as an anti-phishing tool since it derived manageable size models when compared with other traditional techniques without hindering the classification performance. Further evaluation results using other several classification datasets from different domains obtained from University of California Data Repository have corroborated eDRIâs competitive performance with respect to accuracy, number of knowledge representation, training time and items space reduction. This makes the proposed technique not only efficient in inducing rules but also effective
- âŠ