83,892 research outputs found
Rule-based Machine Learning Methods for Functional Prediction
We describe a machine learning method for predicting the value of a
real-valued function, given the values of multiple input variables. The method
induces solutions from samples in the form of ordered disjunctive normal form
(DNF) decision rules. A central objective of the method and representation is
the induction of compact, easily interpretable solutions. This rule-based
decision model can be extended to search efficiently for similar cases prior to
approximating function values. Experimental results on real-world data
demonstrate that the new techniques are competitive with existing machine
learning and statistical methods and can sometimes yield superior regression
performance.Comment: See http://www.jair.org/ for any accompanying file
Unsupervised Discovery of Phonological Categories through Supervised Learning of Morphological Rules
We describe a case study in the application of {\em symbolic machine
learning} techniques for the discovery of linguistic rules and categories. A
supervised rule induction algorithm is used to learn to predict the correct
diminutive suffix given the phonological representation of Dutch nouns. The
system produces rules which are comparable to rules proposed by linguists.
Furthermore, in the process of learning this morphological task, the phonemes
used are grouped into phonologically relevant categories. We discuss the
relevance of our method for linguistics and language technology
A Comparative Study of the Application of Different Learning Techniques to Natural Language Interfaces
In this paper we present first results from a comparative study. Its aim is
to test the feasibility of different inductive learning techniques to perform
the automatic acquisition of linguistic knowledge within a natural language
database interface. In our interface architecture the machine learning module
replaces an elaborate semantic analysis component. The learning module learns
the correct mapping of a user's input to the corresponding database command
based on a collection of past input data. We use an existing interface to a
production planning and control system as evaluation and compare the results
achieved by different instance-based and model-based learning algorithms.Comment: 10 pages, to appear CoNLL9
Multi-test Decision Tree and its Application to Microarray Data Classification
Objective:
The desirable property of tools used to investigate biological data is
easy to understand models and predictive decisions.
Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity.
Methods:
We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions.
Results:
Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on datasets by an average percent. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model
are supported by biological evidence in the literature.
Conclusion:
This paper introduces a new type of decision tree which is more suitable for solving biological problems.
MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts
Fitting Prediction Rule Ensembles with R Package pre
Prediction rule ensembles (PREs) are sparse collections of rules, offering
highly interpretable regression and classification models. This paper presents
the R package pre, which derives PREs through the methodology of Friedman and
Popescu (2008). The implementation and functionality of package pre is
described and illustrated through application on a dataset on the prediction of
depression. Furthermore, accuracy and sparsity of PREs is compared with that of
single trees, random forest and lasso regression in four benchmark datasets.
Results indicate that pre derives ensembles with predictive accuracy comparable
to that of random forests, while using a smaller number of variables for
prediction
TSE-IDS: A Two-Stage Classifier Ensemble for Intelligent Anomaly-based Intrusion Detection System
Intrusion detection systems (IDS) play a pivotal role in computer security by discovering and repealing malicious activities in computer networks. Anomaly-based IDS, in particular, rely on classification models trained using historical data to discover such malicious activities. In this paper, an improved IDS based on hybrid feature selection and two-level classifier ensembles is proposed. An hybrid feature selection technique comprising three methods, i.e. particle swarm optimization, ant colony algorithm, and genetic algorithm, is utilized to reduce the feature size of the training datasets (NSL-KDD and UNSW-NB15 are considered in this paper). Features are selected based on the classification performance of a reduced error pruning tree (REPT) classifier. Then, a two-level classifier ensembles based on two meta learners, i.e., rotation forest and bagging, is proposed. On the NSL-KDD dataset, the proposed classifier shows 85.8% accuracy, 86.8% sensitivity, and 88.0% detection rate, which remarkably outperform other classification techniques recently proposed in the literature. Results regarding the UNSW-NB15 dataset also improve the ones achieved by several state of the art techniques. Finally, to verify the results, a two-step statistical significance test is conducted. This is not usually considered by IDS research thus far and, therefore, adds value to the experimental results achieved by the proposed classifier
A Comparative Study on the Use of Classification Algorithms in Financial Forecasting
Financial forecasting is a vital area in computational finance, where several studies have taken place over the years. One way of viewing financial forecasting is as a classification problem, where the goal is to find a model that represents the predictive relationships between predictor attribute values and class attribute values. In this paper we present a comparative study between two bio-inspired classification algorithms, a genetic programming algorithm especially designed for financial forecasting, and an ant colony optimization one, which is designed for classification problems. In addition, we compare the above algorithms with two other state-of-the-art classification algorithms, namely C4.5 and RIPPER. Results show that the ant colony optimization classification algorithm is very successful, significantly outperforming all other algorithms in the given classification problems, which provides insights for improving the design of specific financial forecasting algorithms
- …