87 research outputs found

    Puuttuvien arvojen korvaaminen aliavaruusmenetelmillä

    Get PDF
    In survey practice as well as in many other data analysis tasks, missing values are a common encounter. In this thesis, the missing value imputation task is studied using three subspace methods, principal component analysis (PCA), the Self-Organizing Map (SOM) and the Generative Topographic Mapping (GTM). The application area of interest is survey imputation, where imputation is conventionally conducted using, e.g., hot deck methods or multiple imputation by chained equations (MICE). Similarities and differences between imputation in survey practice and recommendation systems are discussed, as well. The formalism behind missing value imputation is described together with general mechanisms giving rise to missing data. A detailed review of the aforementioned subspace methods in presence of missing data is given in order to motivate the novelties and new implementations contributed. The contributions of this thesis include (i) a novel way of treating missing data in the SOM algorithm, which is shown to improve properties of the model, (ii) a fine-tuned GTM, where the number of radial basis functions is increased during learning and the initialization is made using the SOM, and (iii) a novel regularization for the GTM for binary data. Experimental comparisons of existing and proposed methods are made using the wine data set and Likert-scale data from two wellbeing-related surveys. The variational Bayesian PCA is shown to be superior in the single imputation task. It also enables automatic relevance determination, i.e., automatic selection of the number of principal components needed. Finally, multiple imputation (MI) using the subspace methods and MICE is demonstrated. It is shown, that with survey data with less than 2 % missing data, all MI methods provide very similar population le vel results.Puuttuvat arvot ovat yleisiä niin kyselyaineistoissa kuin muissakin tilastollisesti analysoitavissa aineistoissa. Tässä opinnäytetyössä tutkitaan puuttuvien arvojen korvaamista käyttäen kolmea aliavaruusmenetelmää, pääkomponenttianalyysiä (PCA), itseorganisoivaa karttaa (SOM) ja generatiivista topografista kuvausta (GTM). Sovellusalueena ovat kyselyaineistot, joiden puuttuvia arvoja korvataan perinteisesti esimerkiksi käyttäen niin sanottuja hot-deck -menetelmiä tai moninkertaista ketjutettua korvaamista (multiple imputation by chained equations, MICE). Opinnäytteessä myös tarkastellaan kyselyaineistojen korvaamisen ja suositusjärjestelmien välisistä eroavaisuuksista ja samankaltaisuuksista menetelmätasolla. Edellä mainitut aliavaruusmenetelmät on esitelty yksityiskohtaisesti motivoiden sekä uusia muutoksia, että niiden käyttöä puuttuvien arvojen korvaamisessa. Työssä esitettyjä kontribuutioita ovat (i) uusi tapa käsitellä puuttuvia arvoja SOM-algoritmissa, minkä näytetään parantavan algoritmin ominaisuuksia, (ii) niin sanottu "fine-tuned GTM", jossa käytettävien kantafunktioiden määrää kasvattamalla voidaan oppia parempia malleja, sekä (iii) uudella tavalla regularisoitu GTM-malli binaariselle aineistolle. Kokeellisessa osuudessa vertaillaan ehdotettuja malleja sekä käyttäen tunnettua viiniaineistoa että kahta Likert-asteikkoista hyvinvointikyselyaineistoa. Variaatioaproksimoitu bayesilainen PCA osoittautuu parhaaksi tehtäessä yksittäisiä puuttuvien arvojen korvauksia. Se tekee myös automaattista mallinvalintaa, jolloin erillistä validointia mallin kompleksisuuden valitsemiseksi ei tarvita. Lopuksi näytetään moninkertaista puuttuvien arvojen korvaamista (MI) käyttäen aliavaruusmenetelmiä sekä MICE-menetelmää. Menetelmät tuottavat hyvin samanlaisia tuloksia kyselyaineistolla, jossa on alle 2 % puuttuvia arvoja

    The Super-Donor Phenomenon in Fecal Microbiota Transplantation

    Get PDF
    Fecal microbiota transplantation (FMT) has become a highly effective bacteriotherapy for recurrent Clostridium difficile infection. Meanwhile the efficacy of FMT for treating chronic diseases associated with microbial dysbiosis has so far been modest with a much higher variability in patient response. Notably, a number of studies suggest that FMT success is dependent on the microbial diversity and composition of the stool donor, leading to the proposition of the existence of FMT super-donors. The identification and subsequent characterization of super-donor gut microbiomes will inevitably advance our understanding of the microbial component of chronic diseases and allow for more targeted bacteriotherapy approaches in the future. Here, we review the evidence for super-donors in FMT and explore the concept of keystone species as predictors of FMT success. Possible effects of host-genetics and diet on FMT engraftment and maintenance are also considered. Finally, we discuss the potential long-term applicability of FMT for chronic disease and highlight how super-donors could provide the basis for dysbiosis-matched FMTs

    Fixed background EM algorithm for semi-supervised anomaly detection

    Get PDF

    An additive Gaussian process regression model for interpretable non-parametric analysis of longitudinal data

    Get PDF
    Biomedical research typically involves longitudinal study designs where samples from individuals are measured repeatedly over time and the goal is to identify risk factors (covariates) that are associated with an outcome value. General linear mixed effect models are the standard workhorse for statistical analysis of longitudinal data. However, analysis of longitudinal data can be complicated for reasons such as difficulties in modelling correlated outcome values, functional (time-varying) covariates, nonlinear and non-stationary effects, and model inference. We present LonGP, an additive Gaussian process regression model that is specifically designed for statistical analysis of longitudinal data, which solves these commonly faced challenges. LonGP can model time-varying random effects and non-stationary signals, incorporate multiple kernel learning, and provide interpretable results for the effects of individual covariates and their interactions. We demonstrate LonGP’s performance and accuracy by analysing various simulated and real longitudinal -omics datasets

    The fecal microbiotas of women of Pacific and New Zealand European ethnicities are characterized by distinctive enterotypes that reflect dietary intakes and fecal water content

    Get PDF
    Obesity is a complex, multifactorial condition that is an important risk factor for noncommunicable diseases including cardiovascular disease and type 2 diabetes. While prevention and management require a healthy and energy balanced diet and adequate physical activity, the taxonomic composition and functional attributes of the colonic microbiota may have a supplementary role in the development of obesity. The taxonomic composition and metabolic capacity of the fecal microbiota of 286 women, resident in Auckland New Zealand, was determined by metagenomic analysis. Associations with BMI (obese, nonobese), body fat composition, and ethnicity (Pacific, n = 125; NZ European women [NZE], n = 161) were assessed using regression analyses. The fecal microbiotas were characterized by the presence of three distinctive enterotypes, with enterotype 1 represented in both Pacific and NZE women (39 and 61%, respectively), enterotype 2 mainly in Pacific women (84 and 16%) and enterotype 3 mainly in NZE women (13 and 87%). Enterotype 1 was characterized mainly by the relative abundances of butyrate producing species, Eubacterium rectale and Faecalibacterium prausnitzii, enterotype 2 by the relative abundances of lactic acid producing species, Bifidobacterium adolescentis, Bifidobacterium bifidum, and Lactobacillus ruminis, and enterotype 3 by the relative abundances of Subdoligranulum sp., Akkermansia muciniphila, Ruminococcus bromii, and Methanobrevibacter smithii. Enterotypes were also associated with BMI, visceral fat %, and blood cholesterol. Habitual food group intake was estimated using a 5 day nonconsecutive estimated food record and a 30 day, 220 item semi-quantitative Food Frequency Questionnaire. Higher intake of 'egg' and 'dairy' products was associated with enterotype 3, whereas 'non-starchy vegetables', 'nuts and seeds' and 'plant-based fats' were positively associated with enterotype 1. In contrast, these same food groups were inversely associated with enterotype 2. Fecal water content, as a proxy for stool consistency/colonic transit time, was associated with microbiota taxonomic composition and gene pools reflective of particular bacterial biochemical pathways. The fecal microbiotas of women of Pacific and New Zealand European ethnicities are characterized by distinctive enterotypes, most likely due to differential dietary intake and fecal consistency/colonic transit time. These parameters need to be considered in future analyses of human fecal microbiotas.Peer reviewe

    An additive Gaussian process regression model for interpretable non-parametric analysis of longitudinal data

    Get PDF
    Biomedical research typically involves longitudinal study designs where samples from individuals are measured repeatedly over time and the goal is to identify risk factors (covariates) that are associated with an outcome value. General linear mixed effect models are the standard workhorse for statistical analysis of longitudinal data. However, analysis of longitudinal data can be complicated for reasons such as difficulties in modelling correlated outcome values, functional (time-varying) covariates, nonlinear and non-stationary effects, and model inference. We present LonGP, an additive Gaussian process regression model that is specifically designed for statistical analysis of longitudinal data, which solves these commonly faced challenges. LonGP can model time-varying random effects and non-stationary signals, incorporate multiple kernel learning, and provide interpretable results for the effects of individual covariates and their interactions. We demonstrate LonGP's performance and accuracy by analysing various simulated and real longitudinal -omics datasets.</p

    Dysregulation of secondary bile acid metabolism precedes islet autoimmunity and type 1 diabetes

    Get PDF
    The gut microbiota is crucial in the regulation of bile acid (BA) metabolism. However, not much is known about the regulation of BAs during progression to type 1 diabetes (T1D). Here, we analyzed serum and stool BAs in longitudinal samples collected at 3, 6,12,18, 24, and 36 months of age from children who developed a single islet autoantibody (AAb) (P1Ab; n = 23) or multiple islet AAbs (P2Ab; n = 13) and controls (CTRs; n = 38) who remained AAb negative. We also analyzed the stool microbiome in a subgroup of these children. Factor analysis showed that age had the strongest impact on both BA and microbiome profiles. We found that at an early age, systemic BAs and microbial secondary BA pathways were altered in the P2Ab group compared with the P1Ab and CTR groups. Our findings thus suggest that dysregulated BA metabolism in early life may contribute to the risk and pathogenesis of T1D.Peer reviewe
    corecore