337,295 research outputs found
Inferring meta-covariates in classification
This paper develops an alternative method for gene selection that combines model based clustering and binary classification. By averaging the covariates within the clusters obtained from model based clustering, we define “meta-covariates” and use them to build a probit regression model, thereby selecting clusters of similarly behaving genes, aiding interpretation. This simultaneous learning task is accomplished by an EM algorithm that optimises a single likelihood function which rewards good performance at both classification and clustering. We explore the performance of our methodology on a well known leukaemia dataset and use the Gene Ontology to interpret our results
Regression Discontinuity Designs Using Covariates
We study regression discontinuity designs when covariates are included in the
estimation. We examine local polynomial estimators that include discrete or
continuous covariates in an additive separable way, but without imposing any
parametric restrictions on the underlying population regression functions. We
recommend a covariate-adjustment approach that retains consistency under
intuitive conditions, and characterize the potential for estimation and
inference improvements. We also present new covariate-adjusted mean squared
error expansions and robust bias-corrected inference procedures, with
heteroskedasticity-consistent and cluster-robust standard errors. An empirical
illustration and an extensive simulation study is presented. All methods are
implemented in \texttt{R} and \texttt{Stata} software packages
Accounting for Individual Differences in Bradley-Terry Models by Means of Recursive Partitioning
The preference scaling of a group of subjects may not be homogeneous, but different
groups of subjects with certain characteristics may show different preference scalings,
each of which can be derived from paired comparisons by means of the Bradley-Terry model.
Usually, either different models are fit in predefined subsets of the
sample, or the effects of subject covariates are explicitly specified in a parametric
model. In both cases, categorical covariates can be employed directly to distinguish
between the different groups, while numeric covariates are typically discretized
prior to modeling.
Here, a semi-parametric approach for recursive partitioning of Bradley-Terry models is
introduced as a means for identifying groups of subjects with homogeneous preference scalings
in a data-driven way. In this approach, the covariates that -- in main effects or
interactions -- distinguish between groups of subjects with different preference
orderings, are detected automatically from the set of candidate covariates. One main
advantage of this approach is that sensible partitions in numeric covariates are
also detected automatically
A Quantile Regression Model for Failure-Time Data with Time-Dependent Covariates
Since survival data occur over time, often important covariates that we wish
to consider also change over time. Such covariates are referred as
time-dependent covariates. Quantile regression offers flexible modeling of
survival data by allowing the covariates to vary with quantiles. This paper
provides a novel quantile regression model accommodating time-dependent
covariates, for analyzing survival data subject to right censoring. Our simple
estimation technique assumes the existence of instrumental variables. In
addition, we present a doubly-robust estimator in the sense of Robins and
Rotnitzky (1992). The asymptotic properties of the estimators are rigorously
studied. Finite-sample properties are demonstrated by a simulation study. The
utility of the proposed methodology is demonstrated using the Stanford heart
transplant dataset
Model-based approaches for predicting gait changes over time
Interest in automated biometrics continues to increase, but has little consideration of time which are especially important in surveillance and scan control. This paper deals with a problem of recognition by gait when time-dependent covariates are added, i.e. when or months have passed between recording of the gallery and the probe sets. Moreover, in some cases some extra covariates present as well. We have shown previously how recognition rates fall significantly when data is captured between lengthy time intervals. Under the assumption that it is possible to have some subjects from the probe for training and that similar subjects have similar changes in gait over time, we suggest predictive models of changes in gait due both to time and now to time-invariant covariates. Our extended time-dependent predictive model derives high recognition rates when time-dependent or subject-dependent covariates are added. However it is not able to cope with time-invariant covariates, therefore a new time-invariant predictive model is suggested to accommodate extra covariates. These are combined to achieve a predictive model which takes into consideration all types of covariates. A considerable improvement in recognition capability is demonstrated, showing that changes can be modelled successfully by the new approach
A Significance Test for Covariates in Nonparametric Regression
We consider testing the significance of a subset of covariates in a
nonparametric regression. These covariates can be continuous and/or discrete.
We propose a new kernel-based test that smoothes only over the covariates
appearing under the null hypothesis, so that the curse of dimensionality is
mitigated. The test statistic is asymptotically pivotal and the rate of which
the test detects local alternatives depends only on the dimension of the
covariates under the null hypothesis. We show the validity of wild bootstrap
for the test. In small samples, our test is competitive compared to existing
procedures.Comment: 42 pages, 6 figure
- …