840 research outputs found

    Physics‐constrained non‐Gaussian probabilistic learning on manifolds

    Get PDF
    International audienceAn extension of the probabilistic learning on manifolds (PLoM), recently introduced by the authors, has been presented: In addition to the initial data set given for performing the probabilistic learning, constraints are given, which correspond to statistics of experiments or of physical models. We consider a non-Gaussian random vector whose unknown probability distribution has to satisfy constraints. The method consists in constructing a generator using the PLoM and the classical Kullback-Leibler minimum cross-entropy principle. The resulting optimization problem is reformulated using Lagrange multipliers associated with the constraints. The optimal solution of the Lagrange multipliers is computed using an efficient iterative algorithm. At each iteration, the Markov chainMonte Carlo algorithm developed for the PLoM is used, consisting in solving an ItĂ´ stochastic differential equation that is projected on a diffusion-maps basis. The method and the algorithm are efficient and allow the construction ofprobabilistic models for high-dimensional problems from small initial data sets and for which an arbitrary number of constraints are specified. The first application is sufficiently simple in order to be easily reproduced. The second one is relative to a stochastic elliptic boundary value problem in high dimension

    Bayesian inference for an illness-death model for stroke with cognition as a latent time-dependent risk factor.

    Get PDF
    Longitudinal data can be used to estimate the transition intensities between healthy and unhealthy states prior to death. An illness-death model for history of stroke is presented, where time-dependent transition intensities are regressed on a latent variable representing cognitive function. The change of this function over time is described by a linear growth model with random effects. Occasion-specific cognitive function is measured by an item response model for longitudinal scores on the Mini-Mental State Examination, a questionnaire used to screen for cognitive impairment. The illness-death model will be used to identify and to explore the relationship between occasion-specific cognitive function and stroke. Combining a multi-state model with the latent growth model defines a joint model which extends current statistical inference regarding disease progression and cognitive function. Markov chain Monte Carlo methods are used for Bayesian inference. Data stem from the Medical Research Council Cognitive Function and Ageing Study in the UK (1991-2005)

    Empirical Bayesian Mixture Models for Medical Image Translation

    Get PDF
    Automatically generating one medical imaging modality from another is known as medical image translation, and has numerous interesting applications. This paper presents an interpretable generative modelling approach to medical image translation. By allowing a common model for group-wise normalisation and segmentation of brain scans to handle missing data, the model allows for predicting entirely missing modalities from one, or a few, MR contrasts. Furthermore, the model can be trained on a fairly small number of subjects. The proposed model is validated on three clinically relevant scenarios. Results appear promising and show that a principled, probabilistic model of the relationship between multi-channel signal intensities can be used to infer missing modalities -- both MR contrasts and CT images.Comment: Accepted to the Simulation and Synthesis in Medical Imaging (SASHIMI) workshop at MICCAI 201

    iQuantitator: A tool for protein expression inference using iTRAQ

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Isobaric Tags for Relative and Absolute Quantitation (iTRAQ™) [Applied Biosystems] have seen increased application in differential protein expression analysis. To facilitate the growing need to analyze iTRAQ data, especially for cases involving multiple iTRAQ experiments, we have developed a modeling approach, statistical methods, and tools for estimating the relative changes in protein expression under various treatments and experimental conditions.</p> <p>Results</p> <p>This modeling approach provides a unified analysis of data from multiple iTRAQ experiments and links the observed quantity (reporter ion peak area) to the experiment design and the calculated quantity of interest (treatment-dependent protein and peptide fold change) through an additive model under log transformation. Others have demonstrated, through a case study, this modeling approach and noted the computational challenges of parameter inference in the unbalanced data set typical of multiple iTRAQ experiments. Here we present the development of an inference approach, based on hierarchical regression with batching of regression coefficients and Markov Chain Monte Carlo (MCMC) methods that overcomes some of these challenges. In addition to our discussion of the underlying method, we also present our implementation of the software, simulation results, experimental results, and sample output from the resulting analysis report.</p> <p>Conclusion</p> <p>iQuantitator's process-based modeling approach overcomes limitations in current methods and allows for application in a variety of experimental designs. Additionally, hypertext-linked documents produced by the tool aid in the interpretation and exploration of results.</p

    Improving quality indicator report cards through Bayesian modeling

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The National Database for Nursing Quality Indicators<sup>ÂŽ </sup>(NDNQI<sup>ÂŽ</sup>) was established in 1998 to assist hospitals in monitoring indicators of nursing quality (eg, falls and pressure ulcers). Hospitals participating in NDNQI transmit data from nursing units to an NDNQI data repository. Data are summarized and published in reports that allow participating facilities to compare the results for their units with those from other units across the nation. A disadvantage of this reporting scheme is that the sampling variability is not explicit. For example, suppose a small nursing unit that has 2 out of 10 (rate of 20%) patients with pressure ulcers. Should the nursing unit immediately undertake a quality improvement plan because of the rate difference from the national average (7%)?</p> <p>Methods</p> <p>In this paper, we propose approximating 95% credible intervals (CrIs) for unit-level data using statistical models that account for the variability in unit rates for report cards.</p> <p>Results</p> <p>Bayesian CrIs communicate the level of uncertainty of estimates more clearly to decision makers than other significance tests.</p> <p>Conclusion</p> <p>A benefit of this approach is that nursing units would be better able to distinguish problematic or beneficial trends from fluctuations likely due to chance.</p

    Uncertainty in the Tail of the Variant Creutzfeldt-Jakob Disease Epidemic in the UK

    Get PDF
    Despite low case numbers the variant Creutzfeldt-Jakob disease epidemic poses many challenges for public health planning due to remaining uncertainties in disease biology and transmission routes. We develop a stochastic model for variant CJD transmission, taking into account the known transmission routes (food and red-cell transfusion) to assess the remaining uncertainty in the epidemic. We use Bayesian methods to obtain scenarios consistent with current data. Our results show a potentially long but uncertain tail in the epidemic, with a peak annual incidence of around 11 cases, but the 95% credibility interval between 1 and 65 cases. These cases are predicted to be due to past food-borne transmissions occurring in previously mostly unaffected genotypes and to transmissions via blood transfusion in all genotypes. However, we also show that the latter are unlikely to be identifiable as transfusion-associated cases by case-linking. Regardless of the numbers of future cases, even in the absence of any further control measures, we do not find any self-sustaining epidemics

    Identifier mapping performance for integrating transcriptomics and proteomics experimental results

    Get PDF
    Background\ud Studies integrating transcriptomic data with proteomic data can illuminate the proteome more clearly than either separately. Integromic studies can deepen understanding of the dynamic complex regulatory relationship between the transcriptome and the proteome. Integrating these data dictates a reliable mapping between the identifier nomenclature resultant from the two high-throughput platforms. However, this kind of analysis is well known to be hampered by lack of standardization of identifier nomenclature among proteins, genes, and microarray probe sets. Therefore data integration may also play a role in critiquing the fallible gene identifications that both platforms emit.\ud \ud Results\ud We compared three freely available internet-based identifier mapping resources for mapping UniProt accessions (ACCs) to Affymetrix probesets identifications (IDs): DAVID, EnVision, and NetAffx. Liquid chromatography-tandem mass spectrometry analyses of 91 endometrial cancer and 7 noncancer samples generated 11,879 distinct ACCs. For each ACC, we compared the retrieval sets of probeset IDs from each mapping resource. We confirmed a high level of discrepancy among the mapping resources. On the same samples, mRNA expression was available. Therefore, to evaluate the quality of each ACC-to-probeset match, we calculated proteome-transcriptome correlations, and compared the resources presuming that better mapping of identifiers should generate a higher proportion of mapped pairs with strong inter-platform correlations. A mixture model for the correlations fitted well and supported regression analysis, providing a window into the performance of the mapping resources. The resources have added and dropped matches over two years, but their overall performance has not changed.\ud \ud Conclusions\ud The methods presented here serve to achieve concrete context-specific insight, to support well-informed decisions in choosing an ID mapping strategy for "omic" data merging

    Measuring Global Credibility with Application to Local Sequence Alignment

    Get PDF
    Computational biology is replete with high-dimensional (high-D) discrete prediction and inference problems, including sequence alignment, RNA structure prediction, phylogenetic inference, motif finding, prediction of pathways, and model selection problems in statistical genetics. Even though prediction and inference in these settings are uncertain, little attention has been focused on the development of global measures of uncertainty. Regardless of the procedure employed to produce a prediction, when a procedure delivers a single answer, that answer is a point estimate selected from the solution ensemble, the set of all possible solutions. For high-D discrete space, these ensembles are immense, and thus there is considerable uncertainty. We recommend the use of Bayesian credibility limits to describe this uncertainty, where a (1−α)%, 0≤α≤1, credibility limit is the minimum Hamming distance radius of a hyper-sphere containing (1−α)% of the posterior distribution. Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete. The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators. Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments

    Exposure of neonatal rats to maternal cafeteria feeding during suckling alters hepatic gene expression and DNA methylation in the insulin signalling pathway

    Get PDF
    Nutrition in early life is a determinant of lifelong physiological and metabolic function. Diseases that are associated with ageing may, therefore, have their antecedents in maternal nutrition during pregnancy and lactation. Rat mothers were fed either a standard laboratory chow diet (C) or a cafeteria diet (O) based upon a varied panel of highly palatable human foods, during lactation. Their offspring were then weaned onto chow or cafeteria diet giving four groups of animals (CC, CO, OC, OO n=9-10). Livers were harvested 10 weeks post-weaning for assessment of gene and protein expression, and DNA methylation. Cafeteria feeding post-weaning impaired glucose tolerance and was associated with sex-specific altered mRNA expression of peroxisome proliferator activated receptor gamma (PPARg) and components of the insulin-signalling pathway (Irs2, Akt1 and IrB). Exposure to the cafeteria diet during the suckling period modified the later response to the dietary challenge. Post-weaning cafeteria feeding only down-regulated IrB when associated with cafeteria feeding during suckling (group OO, interaction of diet in weaning and lactation P=0.041). Responses to cafeteria diet during both phases of the experiment varied between males and females. Global DNA methylation was altered in the liver following cafeteria feeding in the post-weaning period, in males but not females. Methylation of the IrB promoter was increased in group OC, but not OO (P=0.036). The findings of this study add to a growing evidence base that suggests tissue function across the lifespan a product of cumulative modifications to the epigenome and transcriptome, which may be both tissue and sex-specific

    Performance of CMS muon reconstruction in pp collision events at sqrt(s) = 7 TeV

    Get PDF
    The performance of muon reconstruction, identification, and triggering in CMS has been studied using 40 inverse picobarns of data collected in pp collisions at sqrt(s) = 7 TeV at the LHC in 2010. A few benchmark sets of selection criteria covering a wide range of physics analysis needs have been examined. For all considered selections, the efficiency to reconstruct and identify a muon with a transverse momentum pT larger than a few GeV is above 95% over the whole region of pseudorapidity covered by the CMS muon system, abs(eta) < 2.4, while the probability to misidentify a hadron as a muon is well below 1%. The efficiency to trigger on single muons with pT above a few GeV is higher than 90% over the full eta range, and typically substantially better. The overall momentum scale is measured to a precision of 0.2% with muons from Z decays. The transverse momentum resolution varies from 1% to 6% depending on pseudorapidity for muons with pT below 100 GeV and, using cosmic rays, it is shown to be better than 10% in the central region up to pT = 1 TeV. Observed distributions of all quantities are well reproduced by the Monte Carlo simulation.Comment: Replaced with published version. Added journal reference and DO
    • …
    corecore