337,295 research outputs found

    Inferring meta-covariates in classification

    Get PDF
    This paper develops an alternative method for gene selection that combines model based clustering and binary classification. By averaging the covariates within the clusters obtained from model based clustering, we define “meta-covariates” and use them to build a probit regression model, thereby selecting clusters of similarly behaving genes, aiding interpretation. This simultaneous learning task is accomplished by an EM algorithm that optimises a single likelihood function which rewards good performance at both classification and clustering. We explore the performance of our methodology on a well known leukaemia dataset and use the Gene Ontology to interpret our results

    Regression Discontinuity Designs Using Covariates

    Full text link
    We study regression discontinuity designs when covariates are included in the estimation. We examine local polynomial estimators that include discrete or continuous covariates in an additive separable way, but without imposing any parametric restrictions on the underlying population regression functions. We recommend a covariate-adjustment approach that retains consistency under intuitive conditions, and characterize the potential for estimation and inference improvements. We also present new covariate-adjusted mean squared error expansions and robust bias-corrected inference procedures, with heteroskedasticity-consistent and cluster-robust standard errors. An empirical illustration and an extensive simulation study is presented. All methods are implemented in \texttt{R} and \texttt{Stata} software packages

    Accounting for Individual Differences in Bradley-Terry Models by Means of Recursive Partitioning

    Get PDF
    The preference scaling of a group of subjects may not be homogeneous, but different groups of subjects with certain characteristics may show different preference scalings, each of which can be derived from paired comparisons by means of the Bradley-Terry model. Usually, either different models are fit in predefined subsets of the sample, or the effects of subject covariates are explicitly specified in a parametric model. In both cases, categorical covariates can be employed directly to distinguish between the different groups, while numeric covariates are typically discretized prior to modeling. Here, a semi-parametric approach for recursive partitioning of Bradley-Terry models is introduced as a means for identifying groups of subjects with homogeneous preference scalings in a data-driven way. In this approach, the covariates that -- in main effects or interactions -- distinguish between groups of subjects with different preference orderings, are detected automatically from the set of candidate covariates. One main advantage of this approach is that sensible partitions in numeric covariates are also detected automatically

    A Quantile Regression Model for Failure-Time Data with Time-Dependent Covariates

    Full text link
    Since survival data occur over time, often important covariates that we wish to consider also change over time. Such covariates are referred as time-dependent covariates. Quantile regression offers flexible modeling of survival data by allowing the covariates to vary with quantiles. This paper provides a novel quantile regression model accommodating time-dependent covariates, for analyzing survival data subject to right censoring. Our simple estimation technique assumes the existence of instrumental variables. In addition, we present a doubly-robust estimator in the sense of Robins and Rotnitzky (1992). The asymptotic properties of the estimators are rigorously studied. Finite-sample properties are demonstrated by a simulation study. The utility of the proposed methodology is demonstrated using the Stanford heart transplant dataset

    Model-based approaches for predicting gait changes over time

    No full text
    Interest in automated biometrics continues to increase, but has little consideration of time which are especially important in surveillance and scan control. This paper deals with a problem of recognition by gait when time-dependent covariates are added, i.e. when 66 or 1212 months have passed between recording of the gallery and the probe sets. Moreover, in some cases some extra covariates present as well. We have shown previously how recognition rates fall significantly when data is captured between lengthy time intervals. Under the assumption that it is possible to have some subjects from the probe for training and that similar subjects have similar changes in gait over time, we suggest predictive models of changes in gait due both to time and now to time-invariant covariates. Our extended time-dependent predictive model derives high recognition rates when time-dependent or subject-dependent covariates are added. However it is not able to cope with time-invariant covariates, therefore a new time-invariant predictive model is suggested to accommodate extra covariates. These are combined to achieve a predictive model which takes into consideration all types of covariates. A considerable improvement in recognition capability is demonstrated, showing that changes can be modelled successfully by the new approach

    A Significance Test for Covariates in Nonparametric Regression

    Get PDF
    We consider testing the significance of a subset of covariates in a nonparametric regression. These covariates can be continuous and/or discrete. We propose a new kernel-based test that smoothes only over the covariates appearing under the null hypothesis, so that the curse of dimensionality is mitigated. The test statistic is asymptotically pivotal and the rate of which the test detects local alternatives depends only on the dimension of the covariates under the null hypothesis. We show the validity of wild bootstrap for the test. In small samples, our test is competitive compared to existing procedures.Comment: 42 pages, 6 figure
    corecore