1,410 research outputs found

    Reply to determining structural identifiability of parameter learning machines

    Get PDF
    The paper Ran and Hu (2014, Neurocomputing) examines identifiability and parameter redundancy in classes of models used in machine learning. This note discusses the results on global identifiability and also clarifies that the paper's results on parameter redundancy already exist in the paper Cole et al. (2010, Mathematical Biosciences)

    A Primer on Causality in Data Science

    Get PDF
    Many questions in Data Science are fundamentally causal in that our objective is to learn the effect of some exposure, randomized or not, on an outcome interest. Even studies that are seemingly non-causal, such as those with the goal of prediction or prevalence estimation, have causal elements, including differential censoring or measurement. As a result, we, as Data Scientists, need to consider the underlying causal mechanisms that gave rise to the data, rather than simply the pattern or association observed in those data. In this work, we review the 'Causal Roadmap' of Petersen and van der Laan (2014) to provide an introduction to some key concepts in causal inference. Similar to other causal frameworks, the steps of the Roadmap include clearly stating the scientific question, defining of the causal model, translating the scientific question into a causal parameter, assessing the assumptions needed to express the causal parameter as a statistical estimand, implementation of statistical estimators including parametric and semi-parametric methods, and interpretation of our findings. We believe that using such a framework in Data Science will help to ensure that our statistical analyses are guided by the scientific question driving our research, while avoiding over-interpreting our results. We focus on the effect of an exposure occurring at a single time point and highlight the use of targeted maximum likelihood estimation (TMLE) with Super Learner.Comment: 26 pages (with references); 4 figure

    A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

    Full text link
    We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual's covariates on another's outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment

    Basic research planning in mathematical pattern recognition and image analysis

    Get PDF
    Fundamental problems encountered while attempting to develop automated techniques for applications of remote sensing are discussed under the following categories: (1) geometric and radiometric preprocessing; (2) spatial, spectral, temporal, syntactic, and ancillary digital image representation; (3) image partitioning, proportion estimation, and error models in object scene interference; (4) parallel processing and image data structures; and (5) continuing studies in polarization; computer architectures and parallel processing; and the applicability of "expert systems" to interactive analysis

    Practical identifiability analysis of environmental models

    Get PDF
    Identifiability of a system model can be considered as the extent to which one can capture its parameter values from observational data and other prior knowledge of the system. Identifiability must be considered in context so that the objectives of the modelling must also be taken into account in its interpretation. A model may be identifiable for certain objective functions but not others; its identifiability may depend not just on the model structure but also on the level and type of noise, and may even not be identifiable when there is no noise on the observational data. Context also means that non-identifiability might not matter in some contexts, such as when representing pluralistic values among stakeholders, and may be very important in others, such as where it leads to intolerable uncertainties in model predictions. Uncertainty quantification of environmental systems is receiving increasing attention especially through the development of sophisticated methods, often statistically-based. This is partly driven by the desire of society and its decision makers to make more informed judgments as to how systems are better managed and associated resources efficiently allocated. Less attention seems to be given by modellers to understand the imperfections in their models and their implications. Practical methods of identifiability analysis can assist greatly here to assess if there is an identifiability problem so that one can proceed to decide if it matters, and if so how to go about modifying the model (transforming parameters, selecting specific data periods, changing model structure, using a more sophisticated objective function). A suite of relevant methods is available and the major useful ones are discussed here including sensitivity analysis, response surface methods, model emulation and the quantification of uncertainty. The paper also addresses various perspectives and concepts that warrant further development and use

    A Manifesto for the Equifinality Thesis.

    Get PDF
    This essay discusses some of the issues involved in the identification and predictions of hydrological models given some calibration data. The reasons for the incompleteness of traditional calibration methods are discussed. The argument is made that the potential for multiple acceptable models as representations of hydrological and other environmental systems (the equifinality thesis) should be given more serious consideration than hitherto. It proposes some techniques for an extended GLUE methodology to make it more rigorous and outlines some of the research issues still to be resolved

    Global Identifiability Analysis of Statistical Models using an Information-Theoretic Estimator in a Bayesian Framework

    Full text link
    An information-theoretic estimator is proposed to assess the global identifiability of statistical models with practical consideration. The framework is formulated in a Bayesian statistical setting which is the foundation for parameter estimation under aleatoric and epistemic uncertainty. No assumptions are made about the structure of the statistical model or the prior distribution while constructing the estimator. The estimator has the following notable advantages: first, no controlled experiment or data is required to conduct the practical identifiability analysis; second, different forms of uncertainties, such as model form, parameter, or measurement can be taken into account; third, the identifiability analysis is global, rather than being dependent on a realization of parameters. If an individual parameter has low identifiability, it can belong to an identifiable subset such that parameters within the subset have a functional relationship and thus have a combined effect on the statistical model. The practical identifiability framework is extended to highlight the dependencies between parameter pairs that emerge a posteriori to find identifiable parameter subsets. Examining the practical identifiability of an individual parameter along with its dependencies with other parameters is informative for an estimation-centric parameterization and model selection. The applicability of the proposed approach is demonstrated using a linear Gaussian model and a non-linear methane-air reduced kinetics model
    • …
    corecore