1,253 research outputs found

    Nonparametric inference of doubly stochastic Poisson process data via the kernel method

    Full text link
    Doubly stochastic Poisson processes, also known as the Cox processes, frequently occur in various scientific fields. In this article, motivated primarily by analyzing Cox process data in biophysics, we propose a nonparametric kernel-based inference method. We conduct a detailed study, including an asymptotic analysis, of the proposed method, and provide guidelines for its practical use, introducing a fast and stable regression method for bandwidth selection. We apply our method to real photon arrival data from recent single-molecule biophysical experiments, investigating proteins' conformational dynamics. Our result shows that conformational fluctuation is widely present in protein systems, and that the fluctuation covers a broad range of time scales, highlighting the dynamic and complex nature of proteins' structure.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS352 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Synthesizing electronic health records for predictive models in low-middle-income countries (LMICs)

    Get PDF
    The spread of machine learning models, coupled with by the growing adoption of electronic health records (EHRs), has opened the door for developing clinical decision support systems. However, despite the great promise of machine learning for healthcare in low-middle-income countries (LMICs), many data-specific limitations, such as the small size and irregular sampling, hinder the progress in such applications. Recently, deep generative models have been proposed to generate realistic-looking synthetic data, including EHRs, by learning the underlying data distribution without compromising patient privacy. In this study, we first use a deep generative model to generate synthetic data based on a small dataset (364 patients) from a LMIC setting. Next, we use synthetic data to build models that predict the onset of hospital-acquired infections based on minimal information collected at patient ICU admission. The performance of the diagnostic model trained on the synthetic data outperformed models trained on the original and oversampled data using techniques such as SMOTE. We also experiment with varying the size of the synthetic data and observe the impact on the performance and interpretability of the models. Our results show the promise of using deep generative models in enabling healthcare data owners to develop and validate models that serve their needs and applications, despite limitations in dataset size

    Physics-Driven ML-Based Modelling for Correcting Inverse Estimation

    Full text link
    When deploying machine learning estimators in science and engineering (SAE) domains, it is critical to avoid failed estimations that can have disastrous consequences, e.g., in aero engine design. This work focuses on detecting and correcting failed state estimations before adopting them in SAE inverse problems, by utilizing simulations and performance metrics guided by physical laws. We suggest to flag a machine learning estimation when its physical model error exceeds a feasible threshold, and propose a novel approach, GEESE, to correct it through optimization, aiming at delivering both low error and high efficiency. The key designs of GEESE include (1) a hybrid surrogate error model to provide fast error estimations to reduce simulation cost and to enable gradient based backpropagation of error feedback, and (2) two generative models to approximate the probability distributions of the candidate states for simulating the exploitation and exploration behaviours. All three models are constructed as neural networks. GEESE is tested on three real-world SAE inverse problems and compared to a number of state-of-the-art optimization/search approaches. Results show that it fails the least number of times in terms of finding a feasible state correction, and requires physical evaluations less frequently in general.Comment: 19 pages, the paper is accepted by Neurips 2023 as a spotligh

    Vertical Changes in Soil Physical Structure and Water Flow

    Get PDF
    Previous microplastic research under laboratory conditions has focused on microplastics that are homogeneously mixed into test media, in order to maximize test reproducibility and uniform bio-accessibility. Here we specifically focused on testing the idea that microplastics in soil could affect adjacent soil layers not containing microplastic themselves. We included two different microplastics (low-density polyethylene films and polyacrylonitrile fibers) and carried out a soil column test consisting of three different vertical layers (0–3 cm, top, control soil; 3–6 cm, middle, microplastic-containing soil; 6–9 cm, bottom, control soil). Our study shows that microplastic-containing soil layers can act as an anthropogenic barrier in the soil column, interrupting the vertical water flow. These changes directly affected the water content of adjacent layers, and changes in the proportion of soil aggregate sizes occurred for each depth of the soil columns. We also observed that these physical changes trigger changes in soil respiration, but do not translate to effects on enzyme activities. These results imply that the soil environment in non-contaminated parts of the soil can be altered by microplastic contamination in adjacent layers, as might occur for example during ploughing on agricultural fields. More generally, our results highlight the need to further examine effects of microplastic in experiments that do not treat this kind of pollution as uniformly distributed

    Split tolerance permits safe Ad5-GUCY2C-PADRE vaccine-induced T-cell responses in colon cancer patients.

    Get PDF
    Background: The colorectal cancer antigen GUCY2C exhibits unique split tolerance, evoking antigen-specific CD8+, but not CD4+, T-cell responses that deliver anti-tumor immunity without autoimmunity in mice. Here, the cancer vaccine Ad5-GUCY2C-PADRE was evaluated in a first-in-man phase I clinical study of patients with early-stage colorectal cancer to assess its safety and immunological efficacy. Methods: Ten patients with surgically-resected stage I or stage II (pN0) colon cancer received a single intramuscular injection of 1011 viral particles (vp) of Ad5-GUCY2C-PADRE. Safety assessment and immunomonitoring were carried out for 6 months following immunization. This trial employed continual monitoring of both efficacy and toxicity of subjects as joint primary outcomes. Results: All patients receiving Ad5-GUCY2C-PADRE completed the study and none developed adverse events greater than grade 1. Antibody responses to GUCY2C were detected in 10% of patients, while 40% exhibited GUCY2C-specific T-cell responses. GUCY2C-specific responses were exclusively CD8+ cytotoxic T cells, mimicking pre-clinical studies in mice in which GUCY2C-specific CD4+ T cells are eliminated by self-tolerance, while CD8+ T cells escape tolerance and mediate antitumor immunity. Moreover, pre-existing neutralizing antibodies (NAbs) to the Ad5 vector were associated with poor vaccine-induced responses, suggesting that Ad5 NAbs oppose GUCY2C immune responses to the vaccine in patients and supported by mouse studies. Conclusions: Split tolerance to GUCY2C in cancer patients can be exploited to safely generate antigen-specific cytotoxic CD8+, but not autoimmune CD4+, T cells by Ad5-GUCY2C-PADRE in the absence of pre-existing NAbs to the viral vector. TRIAL REGISTRATION: This trial (NCT01972737) was registered at ClinicalTrials.gov on October 30th, 2013. https://clinicaltrials.gov/ct2/show/NCT01972737

    Consistent Differential Expression Pattern (CDEP) on microarray to identify genes related to metastatic behavior

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To utilize the large volume of gene expression information generated from different microarray experiments, several meta-analysis techniques have been developed. Despite these efforts, there remain significant challenges to effectively increasing the statistical power and decreasing the Type I error rate while pooling the heterogeneous datasets from public resources. The objective of this study is to develop a novel meta-analysis approach, Consistent Differential Expression Pattern (CDEP), to identify genes with common differential expression patterns across different datasets.</p> <p>Results</p> <p>We combined False Discovery Rate (FDR) estimation and the non-parametric RankProd approach to estimate the Type I error rate in each microarray dataset of the meta-analysis. These Type I error rates from all datasets were then used to identify genes with common differential expression patterns. Our simulation study showed that CDEP achieved higher statistical power and maintained low Type I error rate when compared with two recently proposed meta-analysis approaches. We applied CDEP to analyze microarray data from different laboratories that compared transcription profiles between metastatic and primary cancer of different types. Many genes identified as differentially expressed consistently across different cancer types are in pathways related to metastatic behavior, such as ECM-receptor interaction, focal adhesion, and blood vessel development. We also identified novel genes such as <it>AMIGO2</it>, <it>Gem</it>, and <it>CXCL11 </it>that have not been shown to associate with, but may play roles in, metastasis.</p> <p>Conclusions</p> <p>CDEP is a flexible approach that borrows information from each dataset in a meta-analysis in order to identify genes being differentially expressed consistently. We have shown that CDEP can gain higher statistical power than other existing approaches under a variety of settings considered in the simulation study, suggesting its robustness and insensitivity to data variation commonly associated with microarray experiments.</p> <p><b>Availability</b>: CDEP is implemented in R and freely available at: <url>http://genomebioinfo.musc.edu/CDEP/</url></p> <p><b>Contact</b>: [email protected]</p

    Signaling network prediction by the Ontology Fingerprint enhanced Bayesian network

    Full text link
    Abstract Background Despite large amounts of available genomic and proteomic data, predicting the structure and response of signaling networks is still a significant challenge. While statistical method such as Bayesian network has been explored to meet this challenge, employing existing biological knowledge for network prediction is difficult. The objective of this study is to develop a novel approach that integrates prior biological knowledge in the form of the Ontology Fingerprint to infer cell-type-specific signaling networks via data-driven Bayesian network learning; and to further use the trained model to predict cellular responses. Results We applied our novel approach to address the Predictive Signaling Network Modeling challenge of the fourth (2009) Dialog for Reverse Engineering Assessment's and Methods (DREAM4) competition. The challenge results showed that our method accurately captured signal transduction of a network of protein kinases and phosphoproteins in that the predicted protein phosphorylation levels under all experimental conditions were highly correlated (R2 = 0.93) with the observed results. Based on the evaluation of the DREAM4 organizer, our team was ranked as one of the top five best performers in predicting network structure and protein phosphorylation activity under test conditions. Conclusions Bayesian network can be used to simulate the propagation of signals in cellular systems. Incorporating the Ontology Fingerprint as prior biological knowledge allows us to efficiently infer concise signaling network structure and to accurately predict cellular responses.http://deepblue.lib.umich.edu/bitstream/2027.42/109490/1/12918_2012_Article_989.pd
    • …
    corecore