26,011 research outputs found

    CausaLM: Causal Model Explanation Through Counterfactual Language Models

    Full text link
    Understanding predictions made by deep neural networks is notoriously difficult, but also crucial to their dissemination. As all ML-based methods, they are as good as their training data, and can also capture unwanted biases. While there are tools that can help understand whether such biases exist, they do not distinguish between correlation and causation, and might be ill-suited for text-based models and for reasoning about high level language concepts. A key problem of estimating the causal effect of a concept of interest on a given model is that this estimation requires the generation of counterfactual examples, which is challenging with existing generation technology. To bridge that gap, we propose CausaLM, a framework for producing causal model explanations using counterfactual language representation models. Our approach is based on fine-tuning of deep contextualized embedding models with auxiliary adversarial tasks derived from the causal graph of the problem. Concretely, we show that by carefully choosing auxiliary adversarial pre-training tasks, language representation models such as BERT can effectively learn a counterfactual representation for a given concept of interest, and be used to estimate its true causal effect on model performance. A byproduct of our method is a language representation model that is unaffected by the tested concept, which can be useful in mitigating unwanted bias ingrained in the data.Comment: Our code and data are available at: https://amirfeder.github.io/CausaLM/ Under review for the Computational Linguistics journa

    Learning Counterfactual Representations for Estimating Individual Dose-Response Curves

    Full text link
    Estimating what would be an individual's potential response to varying levels of exposure to a treatment is of high practical relevance for several important fields, such as healthcare, economics and public policy. However, existing methods for learning to estimate counterfactual outcomes from observational data are either focused on estimating average dose-response curves, or limited to settings with only two treatments that do not have an associated dosage parameter. Here, we present a novel machine-learning approach towards learning counterfactual representations for estimating individual dose-response curves for any number of treatments with continuous dosage parameters with neural networks. Building on the established potential outcomes framework, we introduce performance metrics, model selection criteria, model architectures, and open benchmarks for estimating individual dose-response curves. Our experiments show that the methods developed in this work set a new state-of-the-art in estimating individual dose-response

    Empirical Health Law Scholarship: The State of the Field

    Get PDF
    The last three decades have seen the blossoming of the fields of health law and empirical legal studies and their intersection--empirical scholarship in health law and policy. Researchers in legal academia and other settings have conducted hundreds of studies using data to estimate the effects of health law on accident rates, health outcomes, health care utilization, and costs, as well as other outcome variables. Yet the emerging field of empirical health law faces significant challenges--practical, methodological, and political. The purpose of this Article is to survey the current state of the field by describing commonly used methods, analyzing enabling and inhibiting factors in the production and uptake of this type of research by policymakers, and suggesting ways to increase the production and impact of empirical health law studies. In some areas of inquiry, high-quality research has been conducted, and the findings have been successfully imported into policy debates and used to inform evidence-based lawmaking. In other areas, the level of rigor has been uneven, and the best evidence has not translated effectively into sound policy. Despite challenges and historical shortcomings, empirical health law studies can and should have a substantial impact on regulations designed to improve public safety, increase both access to and quality of health care, and foster technological innovation

    Big data for monitoring educational systems

    Get PDF
    This report considers “how advances in big data are likely to transform the context and methodology of monitoring educational systems within a long-term perspective (10-30 years) and impact the evidence based policy development in the sector”, big data are “large amounts of different types of data produced with high velocity from a high number of various types of sources.” Five independent experts were commissioned by Ecorys, responding to themes of: students' privacy, educational equity and efficiency, student tracking, assessment and skills. The experts were asked to consider the “macro perspective on governance on educational systems at all levels from primary, secondary education and tertiary – the latter covering all aspects of tertiary from further, to higher, and to VET”, prioritising primary and secondary levels of education

    A Dynamic Embedding Model of the Media Landscape

    Full text link
    Information about world events is disseminated through a wide variety of news channels, each with specific considerations in the choice of their reporting. Although the multiplicity of these outlets should ensure a variety of viewpoints, recent reports suggest that the rising concentration of media ownership may void this assumption. This observation motivates the study of the impact of ownership on the global media landscape and its influence on the coverage the actual viewer receives. To this end, the selection of reported events has been shown to be informative about the high-level structure of the news ecosystem. However, existing methods only provide a static view into an inherently dynamic system, providing underperforming statistical models and hindering our understanding of the media landscape as a whole. In this work, we present a dynamic embedding method that learns to capture the decision process of individual news sources in their selection of reported events while also enabling the systematic detection of large-scale transformations in the media landscape over prolonged periods of time. In an experiment covering over 580M real-world event mentions, we show our approach to outperform static embedding methods in predictive terms. We demonstrate the potential of the method for news monitoring applications and investigative journalism by shedding light on important changes in programming induced by mergers and acquisitions, policy changes, or network-wide content diffusion. These findings offer evidence of strong content convergence trends inside large broadcasting groups, influencing the news ecosystem in a time of increasing media ownership concentration
    • …
    corecore