1,851 research outputs found
A Diversity-Accuracy Measure for Homogenous Ensemble Selection
Several selection methods in the literature are essentially based on an evaluation function that determines whether a model M contributes positively to boost the performances of the whole ensemble. In this paper, we propose a method called DIversity and ACcuracy for Ensemble Selection (DIACES) using an evaluation function based on both diversity and accuracy. The method is applied on homogenous ensembles composed of C4.5 decision trees and based on a hill climbing strategy. This allows selecting ensembles with the best compromise between maximum diversity and minimum error rate. Comparative studies show that in most cases the proposed method generates reduced size ensembles with better performances than usual ensemble simplification methods
Learning Sentence-internal Temporal Relations
In this paper we propose a data intensive approach for inferring
sentence-internal temporal relations. Temporal inference is relevant for
practical NLP applications which either extract or synthesize temporal
information (e.g., summarisation, question answering). Our method bypasses the
need for manual coding by exploiting the presence of markers like after", which
overtly signal a temporal relation. We first show that models trained on main
and subordinate clauses connected with a temporal marker achieve good
performance on a pseudo-disambiguation task simulating temporal inference
(during testing the temporal marker is treated as unseen and the models must
select the right marker from a set of possible candidates). Secondly, we assess
whether the proposed approach holds promise for the semi-automatic creation of
temporal annotations. Specifically, we use a model trained on noisy and
approximate data (i.e., main and subordinate clauses) to predict
intra-sentential relations present in TimeBank, a corpus annotated rich
temporal information. Our experiments compare and contrast several
probabilistic models differing in their feature space, linguistic assumptions
and data requirements. We evaluate performance against gold standard corpora
and also against human subjects
Fusing Vantage Point Trees and Linear Discriminants for Fast Feature Classification
This paper describes a classification strategy that can be regarded as amore general form of nearest-neighbor classification. It fuses the concepts ofnearestneighbor,linear discriminantandVantage-Pointtrees, yielding an efficient indexingdata structure and classification algorithm. In the learning phase, we define a set ofdisjoint subspaces of reduced complexity that can be separated by linear discrimi-nants, ending up with an ensemble of simple (weak) classifiers that work locally. Inclassification, the closest centroids to the query determine the set of classifiers con-sidered, which responses are weighted. The algorithm was experimentally validatedin datasets widely used in the field, attaining error rates that are favorably compara-ble to the state-of-the-art classification techniques. Lastly, the proposed solution hasa set of interesting properties for a broad range of applications: 1) it is determinis-tic; 2) it classifies in time approximately logarithmic with respect to the size of thelearning set, being far more efficient than nearest neighbor classification in terms ofcomputational cost; and 3) it keeps the generalization ability of simple models.info:eu-repo/semantics/publishedVersio
Preterm Birth Prediction: Deriving Stable and Interpretable Rules from High Dimensional Data
Preterm births occur at an alarming rate of 10-15%. Preemies have a higher
risk of infant mortality, developmental retardation and long-term disabilities.
Predicting preterm birth is difficult, even for the most experienced
clinicians. The most well-designed clinical study thus far reaches a modest
sensitivity of 18.2-24.2% at specificity of 28.6-33.3%. We take a different
approach by exploiting databases of normal hospital operations. We aims are
twofold: (i) to derive an easy-to-use, interpretable prediction rule with
quantified uncertainties, and (ii) to construct accurate classifiers for
preterm birth prediction. Our approach is to automatically generate and select
from hundreds (if not thousands) of possible predictors using stability-aware
techniques. Derived from a large database of 15,814 women, our simplified
prediction rule with only 10 items has sensitivity of 62.3% at specificity of
81.5%.Comment: Presented at 2016 Machine Learning and Healthcare Conference (MLHC
2016), Los Angeles, C
- …