79,029 research outputs found
On the Combination of Logistic Regression and Local Probability Estimates
In this paper we give a survey of the combination of classifiers. We briefly describe basic principles of machine learning and the problem of classifier construction and review several approaches to generate different classifiers as well as established methods to combine different classifiers. Then, we introduce our novel approach to assess the appropriateness of different classifiers based on their characteristics for each test point individuall
Localized Regression
The main problem with localized discriminant techniques is the curse of dimensionality, which seems to restrict their use to the case of few variables. This restriction does not hold if localization is combined with a reduction of dimension. In particular it is shown that localization yields powerful classifiers even in higher dimensions if localization is combined with locally adaptive selection of predictors. A robust localized logistic regression (LLR) method is developed for which all tuning parameters are chosen dataÂĄadaptively. In an extended simulation study we evaluate the potential of the proposed procedure for various types of data and compare it to other classification procedures. In addition we demonstrate that automatic choice of localization, predictor selection and penalty parameters based on cross validation is working well. Finally the method is applied to real data sets and its real world performance is compared to alternative procedures
Feature Augmentation via Nonparametrics and Selection (FANS) in High Dimensional Classification
We propose a high dimensional classification method that involves
nonparametric feature augmentation. Knowing that marginal density ratios are
the most powerful univariate classifiers, we use the ratio estimates to
transform the original feature measurements. Subsequently, penalized logistic
regression is invoked, taking as input the newly transformed or augmented
features. This procedure trains models equipped with local complexity and
global simplicity, thereby avoiding the curse of dimensionality while creating
a flexible nonlinear decision boundary. The resulting method is called Feature
Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by
generalizing the Naive Bayes model, writing the log ratio of joint densities as
a linear combination of those of marginal densities. It is related to
generalized additive models, but has better interpretability and computability.
Risk bounds are developed for FANS. In numerical analysis, FANS is compared
with competing methods, so as to provide a guideline on its best application
domain. Real data analysis demonstrates that FANS performs very competitively
on benchmark email spam and gene expression data sets. Moreover, FANS is
implemented by an extremely fast algorithm through parallel computing.Comment: 30 pages, 2 figure
Detection of Uniform and Non-Uniform Differential Item Functioning by Item Focussed Trees
Detection of differential item functioning by use of the logistic modelling
approach has a long tradition. One big advantage of the approach is that it can
be used to investigate non-uniform DIF as well as uniform DIF. The classical
approach allows to detect DIF by distinguishing between multiple groups. We
propose an alternative method that is a combination of recursive partitioning
methods (or trees) and logistic regression methodology to detect uniform and
non-uniform DIF in a nonparametric way. The output of the method are trees that
visualize in a simple way the structure of DIF in an item showing which
variables are interacting in which way when generating DIF. In addition we
consider a logistic regression method in which DIF can by induced by a vector
of covariates, which may include categorical but also continuous covariates.
The methods are investigated in simulation studies and illustrated by two
applications.Comment: 32 pages, 13 figures, 7 table
Comparison between Suitable Priors for Additive Bayesian Networks
Additive Bayesian networks are types of graphical models that extend the
usual Bayesian generalized linear model to multiple dependent variables through
the factorisation of the joint probability distribution of the underlying
variables. When fitting an ABN model, the choice of the prior of the parameters
is of crucial importance. If an inadequate prior - like a too weakly
informative one - is used, data separation and data sparsity lead to issues in
the model selection process. In this work a simulation study between two weakly
and a strongly informative priors is presented. As weakly informative prior we
use a zero mean Gaussian prior with a large variance, currently implemented in
the R-package abn. The second prior belongs to the Student's t-distribution,
specifically designed for logistic regressions and, finally, the strongly
informative prior is again Gaussian with mean equal to true parameter value and
a small variance. We compare the impact of these priors on the accuracy of the
learned additive Bayesian network in function of different parameters. We
create a simulation study to illustrate Lindley's paradox based on the prior
choice. We then conclude by highlighting the good performance of the
informative Student's t-prior and the limited impact of the Lindley's paradox.
Finally, suggestions for further developments are provided.Comment: 8 pages, 4 figure
Convex and non-convex regularization methods for spatial point processes intensity estimation
This paper deals with feature selection procedures for spatial point
processes intensity estimation. We consider regularized versions of estimating
equations based on Campbell theorem derived from two classical functions:
Poisson likelihood and logistic regression likelihood. We provide general
conditions on the spatial point processes and on penalty functions which ensure
consistency, sparsity and asymptotic normality. We discuss the numerical
implementation and assess finite sample properties in a simulation study.
Finally, an application to tropical forestry datasets illustrates the use of
the proposed methods
Post-Selection Inference for Generalized Linear Models with Many Controls
This paper considers generalized linear models in the presence of many
controls. We lay out a general methodology to estimate an effect of interest
based on the construction of an instrument that immunize against model
selection mistakes and apply it to the case of logistic binary choice model.
More specifically we propose new methods for estimating and constructing
confidence regions for a regression parameter of primary interest , a
parameter in front of the regressor of interest, such as the treatment variable
or a policy variable. These methods allow to estimate at the
root- rate when the total number of other regressors, called controls,
potentially exceed the sample size using sparsity assumptions. The sparsity
assumption means that there is a subset of controls which suffices to
accurately approximate the nuisance part of the regression function.
Importantly, the estimators and these resulting confidence regions are valid
uniformly over -sparse models satisfying and other
technical conditions. These procedures do not rely on traditional consistent
model selection arguments for their validity. In fact, they are robust with
respect to moderate model selection mistakes in variable selection. Under
suitable conditions, the estimators are semi-parametrically efficient in the
sense of attaining the semi-parametric efficiency bounds for the class of
models in this paper
- âŠ