38,111 research outputs found
Unsupervised Discovery of Phonological Categories through Supervised Learning of Morphological Rules
We describe a case study in the application of {\em symbolic machine
learning} techniques for the discovery of linguistic rules and categories. A
supervised rule induction algorithm is used to learn to predict the correct
diminutive suffix given the phonological representation of Dutch nouns. The
system produces rules which are comparable to rules proposed by linguists.
Furthermore, in the process of learning this morphological task, the phonemes
used are grouped into phonologically relevant categories. We discuss the
relevance of our method for linguistics and language technology
Min-Max Predictive Control of a Five-Phase Induction Machine
In this paper, a fuzzy-logic based operator is used instead of a traditional cost function for
the predictive stator current control of a five-phase induction machine (IM). The min-max operator
is explored for the first time as an alternative to the traditional loss function. With this proposal,
the selection of voltage vectors does not need weighting factors that are normally used within
the loss function and require a cumbersome procedure to tune. In order to cope with conflicting
criteria, the proposal uses a decision function that compares predicted errors in the torque producing
subspace and in the x-y subspace. Simulations and experimental results are provided, showing how
the proposal compares with the traditional method of fixed tuning for predictive stator current control.Ministerio de EconomÃa y Competitividad DPI 2016-76493-C3-1-R y 2014/425Unión Europea DPI 2016-76493-C3-1-R y 2014/425Universidad de Sevilla DPI 2016-76493-C3-1-R y 2014/42
A very simple safe-Bayesian random forest
Random forests works by averaging several predictions of de-correlated trees. We show a conceptually radical approach to generate a random forest: random sampling of many trees from a prior distribution, and subsequently performing a weighted ensemble of predictive probabilities. Our approach uses priors that allow sampling of decision trees even before looking at the data, and a power likelihood that explores the space spanned by combination of decision trees. While each tree performs Bayesian inference to compute its predictions, our aggregation procedure uses the power likelihood rather than the likelihood and is therefore strictly speaking not Bayesian. Nonetheless, we refer to it as a Bayesian random forest but with a built-in safety. The safeness comes as it has good predictive performance even if the underlying probabilistic model is wrong. We demonstrate empirically that our Safe-Bayesian random forest outperforms MCMC or SMC based Bayesian decision trees in term of speed and accuracy, and achieves competitive performance to entropy or Gini optimised random forest, yet is very simple to construct
See5 Algorithm versus Discriminant Analysis. An Application to the Prediction of Insolvency in Spanish Non-life Insurance Companies
Prediction of insurance companies insolvency has arised as an important problem in the field of financial research, due to the necessity of protecting the general public whilst minimizing the costs associated to this problem. Most methods applied in the past to tackle this question are traditional statistical techniques which use financial ratios as explicative variables. However, these variables do not usually satisfy statistical assumptions, what complicates the application of the mentioned methods.In this paper, a comparative study of the performance of a well-known parametric statistical technique (Linear Discriminant Analysis) and a non-parametric machine learning technique (See5) is carried out. We have applied the two methods to the problem of the prediction of insolvency of Spanish non-life insurance companies upon the basis of a set of financial ratios. Results indicate a higher performance of the machine learning technique, what shows that this method can be a useful tool to evaluate insolvency of insurance firms.Insolvency, Insurance Companies, Discriminant Analysis, See5.
Ensembles of wrappers for automated feature selection in fish age classification
In feature selection, the most important features must be chosen so as to decrease the number thereof while retaining their discriminatory information. Within this context, a novel feature selection method based on an ensemble of wrappers is proposed and applied for automatically select features in fish age classification. The effectiveness of this procedure using an Atlantic cod database has been tested for different powerful statistical learning classifiers. The subsets based on few features selected, e.g. otolith weight and fish weight, are particularly noticeable given current biological findings and practices in fishery research and the classification results obtained with them outperforms those of previous studies in which a manual feature selection was performed.Peer ReviewedPostprint (author's final draft
A model-based multithreshold method for subgroup identification
Thresholding variable plays a crucial role in subgroup identification for personalizedmedicine. Most existing partitioning methods split the sample basedon one predictor variable. In this paper, we consider setting the splitting rulefrom a combination of multivariate predictors, such as the latent factors, principlecomponents, and weighted sum of predictors. Such a subgrouping methodmay lead to more meaningful partitioning of the population than using a singlevariable. In addition, our method is based on a change point regression modeland thus yields straight forward model-based prediction results. After choosinga particular thresholding variable form, we apply a two-stage multiple changepoint detection method to determine the subgroups and estimate the regressionparameters. We show that our approach can produce two or more subgroupsfrom the multiple change points and identify the true grouping with high probability.In addition, our estimation results enjoy oracle properties. We design asimulation study to compare performances of our proposed and existing methodsand apply them to analyze data sets from a Scleroderma trial and a breastcancer study
Structured Prediction of Sequences and Trees using Infinite Contexts
Linguistic structures exhibit a rich array of global phenomena, however
commonly used Markov models are unable to adequately describe these phenomena
due to their strong locality assumptions. We propose a novel hierarchical model
for structured prediction over sequences and trees which exploits global
context by conditioning each generation decision on an unbounded context of
prior decisions. This builds on the success of Markov models but without
imposing a fixed bound in order to better represent global phenomena. To
facilitate learning of this large and unbounded model, we use a hierarchical
Pitman-Yor process prior which provides a recursive form of smoothing. We
propose prediction algorithms based on A* and Markov Chain Monte Carlo
sampling. Empirical results demonstrate the potential of our model compared to
baseline finite-context Markov models on part-of-speech tagging and syntactic
parsing
- …