553 research outputs found

    Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size!

    Full text link
    How does missing data affect our ability to learn signal structures? It has been shown that learning signal structure in terms of principal components is dependent on the ratio of sample size and dimensionality and that a critical number of observations is needed before learning starts (Biehl and Mietzner, 1993). Here we generalize this analysis to include missing data. Probabilistic principal component analysis is regularly used for estimating signal structures in datasets with missing data. Our analytic result suggests that the effect of missing data is to effectively reduce signal-to-noise ratio rather than - as generally believed - to reduce sample size. The theory predicts a phase transition in the learning curves and this is indeed found both in simulation data and in real datasets.Comment: Accepted to ICML 2019. This version is the submitted pape

    Statistical modelling of conidial discharge of entomophthoralean fungi using a newly discovered Pandora species

    Get PDF
    Entomophthoralean fungi are insect pathogenic fungi and are characterized by their active discharge of infective conidia that infect insects. Our aim was to study the effects of temperature on the discharge and to characterize the variation in the associated temporal pattern of a newly discovered Pandora species with focus on peak location and shape of the discharge. Mycelia were incubated at various temperatures in darkness, and conidial discharge was measured over time. We used a novel modification of a statistical model (pavpop), that simultaneously estimates phase and amplitude effects, into a setting of generalized linear models. This model is used to test hypotheses of peak location and discharge of conidia. The statistical analysis showed that high temperature leads to an early and fast decreasing peak, whereas there were no significant differences in total number of discharged conidia. Using the proposed model we also quantified the biological variation in the timing of the peak location at a fixed temperature.Comment: 23 pages including supplementary materia

    not-MIWAE: Deep Generative Modelling with Missing not at Random Data

    Full text link
    When a missing process depends on the missing values themselves, it needs to be explicitly modelled and taken into account while doing likelihood-based inference. We present an approach for building and fitting deep latent variable models (DLVMs) in cases where the missing process is dependent on the missing data. Specifically, a deep neural network enables us to flexibly model the conditional distribution of the missingness pattern given the data. This allows for incorporating prior information about the type of missingness (e.g. self-censoring) into the model. Our inference technique, based on importance-weighted variational inference, involves maximising a lower bound of the joint likelihood. Stochastic gradients of the bound are obtained by using the reparameterisation trick both in latent space and data space. We show on various kinds of data sets and missingness patterns that explicitly modelling the missing process can be invaluable.Comment: Camera-ready version for ICLR 202

    Enterococcus faecalis bacteremia: please do the echo

    Get PDF
    Infective endocarditis (IE) caused by Enterococcus faecalis (E. faecalis) is a disease of the elderly with an increasing incidence, often health-care associated and with in-hospital mortality rates around 10-20%. E. faecalis IE is notoriously challenging to diagnose due to unspecific symptoms, often presenting with a complex clinical picture with low-grade fever and only moderately elevated infectious parameters. In a newly published prospective multicenter study using echocardiography to screen E. faecalis bacteremia patients, we found an IE prevalence as high as 26%. The 344 included patients with E. faecalis bacteremia had a mean age of 74 (±12) years confirming that it is indeed a disease of the elderly. The key feature of the study was that echocardiography was performed in all patients including transesophageal echocardiography (TEE) in 74%. Transthoracic echocardiography (TTE) missed vegetations in half of the cases where TEE demonstrated vegetations, underlining the importance of TEE
    • …
    corecore