39 research outputs found

    Proxy Methods for Domain Adaptation

    Full text link
    We study the problem of domain adaptation under distribution shift, where the shift is due to a change in the distribution of an unobserved, latent variable that confounds both the covariates and the labels. In this setting, neither the covariate shift nor the label shift assumptions apply. Our approach to adaptation employs proximal causal learning, a technique for estimating causal effects in settings where proxies of unobserved confounders are available. We demonstrate that proxy variables allow for adaptation to distribution shift without explicitly recovering or modeling latent variables. We consider two settings, (i) Concept Bottleneck: an additional ''concept'' variable is observed that mediates the relationship between the covariates and labels; (ii) Multi-domain: training data from multiple source domains is available, where each source domain exhibits a different distribution over the latent confounder. We develop a two-stage kernel estimation approach to adapt to complex distribution shifts in both settings. In our experiments, we show that our approach outperforms other methods, notably those which explicitly recover the latent confounder

    Adapting to Latent Subgroup Shifts via Concepts and Proxies

    Get PDF
    We address the problem of unsupervised domain adaptation when the source domain differs from the target domain because of a shift in the distribution of a latent subgroup. When this subgroup confounds all observed data, neither covariate shift nor label shift assumptions apply. We show that the optimal target predictor can be non-parametrically identified with the help of concept and proxy variables available only in the source domain, and unlabeled data from the target. The identification results are constructive, immediately suggesting an algorithm for estimating the optimal predictor in the target. For continuous observations, when this algorithm becomes impractical, we propose a latent variable model specific to the data generation process at hand. We show how the approach degrades as the size of the shift changes, and verify that it outperforms both covariate and label shift adjustment

    Unraveling the Complexity of Amyotrophic Lateral Sclerosis Survival Prediction

    No full text
    Objective: The heterogeneity of amyotrophic lateral sclerosis (ALS) survival duration, which varies from <1 year to >10 years, challenges clinical decisions and trials. Utilizing data from 801 deceased ALS patients, we: (1) assess the underlying complex relationships among common clinical ALS metrics; (2) identify which clinical ALS metrics are the “best” survival predictors and how their predictive ability changes as a function of disease progression.Methods: Analyses included examination of relationships within the raw data as well as the construction of interactive survival regression and classification models (generalized linear model and random forests model). Dimensionality reduction and feature clustering enabled decomposition of clinical variable contributions. Thirty-eight metrics were utilized, including Medical Research Council (MRC) muscle scores; respiratory function, including forced vital capacity (FVC) and FVC % predicted, oxygen saturation, negative inspiratory force (NIF); the Revised ALS Functional Rating Scale (ALSFRS-R) and its activities of daily living (ADL) and respiratory sub-scores; body weight; onset type, onset age, gender, and height. Prognostic random forest models confirm the dominance of patient age-related parameters decline in classifying survival at thresholds of 30, 60, 90, and 180 days and 1, 2, 3, 4, and 5 years.Results: Collective prognostic insight derived from the overall investigation includes: multi-dimensionality of ALSFRS-R scores suggests cautious usage for survival forecasting; upper and lower extremities independently degenerate and are autonomous from respiratory decline, with the latter associating with nearer-to-death classifications; height and weight-based metrics are auxiliary predictors for farther-from-death classifications; sex and onset site (limb, bulbar) are not independent survival predictors due to age co-correlation.Conclusion: The dimensionality and fluctuating predictors of ALS survival must be considered when developing predictive models for clinical trial development or in-clinic usage. Additional independent metrics and possible revisions to current metrics, like the ALSFRS-R, are needed to capture the underlying complexity needed for population and personalized forecasting of survival

    A comparison of approaches to improve worst-case predictive model performance over patient subpopulations

    No full text
    AbstractPredictive models for clinical outcomes that are accurate on average in a patient population may underperform drastically for some subpopulations, potentially introducing or reinforcing inequities in care access and quality. Model training approaches that aim to maximize worst-case model performance across subpopulations, such as distributionally robust optimization (DRO), attempt to address this problem without introducing additional harms. We conduct a large-scale empirical study of DRO and several variations of standard learning procedures to identify approaches for model development and selection that consistently improve disaggregated and worst-case performance over subpopulations compared to standard approaches for learning predictive models from electronic health records data. In the course of our evaluation, we introduce an extension to DRO approaches that allows for specification of the metric used to assess worst-case performance. We conduct the analysis for models that predict in-hospital mortality, prolonged length of stay, and 30-day readmission for inpatient admissions, and predict in-hospital mortality using intensive care data. We find that, with relatively few exceptions, no approach performs better, for each patient subpopulation examined, than standard learning procedures using the entire training dataset. These results imply that when it is of interest to improve model performance for patient subpopulations beyond what can be achieved with standard practices, it may be necessary to do so via data collection techniques that increase the effective sample size or reduce the level of noise in the prediction problem.</jats:p

    Trypanosome motion represents an adaptation to the crowded environment of the vertebrate bloodstream

    Get PDF
    Blood is a remarkable habitat: it is highly viscous, contains a dense packaging of cells and perpetually flows at velocities varying over three orders of magnitude. Only few pathogens endure the harsh physical conditions within the vertebrate bloodstream and prosper despite being constantly attacked by host antibodies. African trypanosomes are strictly extracellular blood parasites, which evade the immune response through a system of antigenic variation and incessant motility. How the flagellates actually swim in blood remains to be elucidated. Here, we show that the mode and dynamics of trypanosome locomotion are a trait of life within a crowded environment. Using high-speed fluorescence microscopy and ordered micro-pillar arrays we show that the parasites mode of motility is adapted to the density of cells in blood. Trypanosomes are pulled forward by the planar beat of the single flagellum. Hydrodynamic flow across the asymmetrically shaped cell body translates into its rotational movement. Importantly, the presence of particles with the shape, size and spacing of blood cells is required and sufficient for trypanosomes to reach maximum forward velocity. If the density of obstacles, however, is further increased to resemble collagen networks or tissue spaces, the parasites reverse their flagellar beat and consequently swim backwards, in this way avoiding getting trapped. In the absence of obstacles, this flagellar beat reversal occurs randomly resulting in irregular waveforms and apparent cell tumbling. Thus, the swimming behavior of trypanosomes is a surprising example of micro-adaptation to life at low Reynolds numbers. For a precise physical interpretation, we compare our high-resolution microscopic data to results from a simulation technique that combines the method of multi-particle collision dynamics with a triangulated surface model. The simulation produces a rotating cell body and a helical swimming path, providing a functioning simulation method for microorganism with a complex swimming strategy
    corecore