28,142 research outputs found

    Entropy-based gene ranking without selection bias for the predictive classification of microarray data

    Get PDF
    BACKGROUND: We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process). RESULTS: With E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles. CONCLUSIONS: Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance

    Testing for selectivity bias in panel data models

    Get PDF
    Estimation;Panel Data

    Learning to Teach Reinforcement Learning Agents

    Full text link
    In this article we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution we formulate the problem as a learning one and propose a novel RL algorithm capable of learning when to advise, adapting to the student and the task at hand. Furthermore, we argue that learning to advise under a budget is an instance of a more generic learning problem: Constrained Exploitation Reinforcement Learning

    Technical Problems in Social Experimentation: Cost versus Ease of Analysis

    Get PDF
    The goal of the paper is to set forth general guidelines that we believe would enhance the usefulness of future social experiments and to suggest ways of correcting for inherent limitations of them. Although the major motivation for an experiment is to overcome the inherent limitations of structural econometric models, in many instances the experimental designs have subverted this motivation. The primary advantages of randomized controlled experiments were often lost. The major complication for the analysis of the experiments was induced by an endogenous sample selection and treatment assignment procedure that selected the experimental participants and assigned them to controlversus treatment groups partly on the basis of the variable whose response the experiments were intended to measure. We propose that to overcome these difficulties, the goal of an experimental design should be as nearly as possible to allow analysis based on a simple analysis of variance model. Although complexities attendant to endogenous stratification can be avoided, there are inherent limitations of the experiments that cannot. Two major ones are self-determination of participation and self-selection out, through attrition.But these problems, we believe, can be corrected for with relative ease if endogenous stratification is eliminated. Finally, we propose that as a guiding principle, the experiments should have as a first priority the precise estimation of a single or a small number of treatment effects.

    Basic research planning in mathematical pattern recognition and image analysis

    Get PDF
    Fundamental problems encountered while attempting to develop automated techniques for applications of remote sensing are discussed under the following categories: (1) geometric and radiometric preprocessing; (2) spatial, spectral, temporal, syntactic, and ancillary digital image representation; (3) image partitioning, proportion estimation, and error models in object scene interference; (4) parallel processing and image data structures; and (5) continuing studies in polarization; computer architectures and parallel processing; and the applicability of "expert systems" to interactive analysis

    Instrumental Variable Estimators for Binary Outcomes

    Get PDF
    Instrumental variables (IVs) can be used to construct estimators of exposure effects on the outcomes of studies affected by non-ignorable selection of the exposure. Estimators which fail to adjust for the effects of non-ignorable selection will be biased and inconsistent. Such situations commonly arise in observational studies, but even randomised controlled trials can be affected by non-ignorable participant non-compliance. In this paper, we review IV estimators for studies in which the outcome is binary. Recent work on identification is interpreted using an integrated structural modelling and potential outcomes framework, within which we consider the links between different approaches developed in statistics and econometrics. The implicit assumptions required for bounding causal effects and point-identification by each estimator are highlighted and compared within our framework. Finally, the implications for practice are discussed.bounds, causal inference, generalized method of moments, local average treatment effects, marginal structural models, non-compliance, parameter identification, potential outcomes, structural mean models, structural models
    corecore