31 research outputs found

    Multiple Testing of Submatrices of a Precision Matrix With Applications to Identification of Between Pathway Interactions

    No full text
    <p>Making accurate inference for gene regulatory networks, including inferring about pathway-by-pathway interactions, is an important and difficult task. Motivated by such genomic applications, we consider multiple testing for conditional dependence between subgroups of variables. Under a Gaussian graphical model framework, the problem is translated into simultaneous testing for a collection of submatrices of a high-dimensional precision matrix with each submatrix summarizing the dependence structure between two subgroups of variables.</p> <p>A novel multiple testing procedure is proposed and both theoretical and numerical properties of the procedure are investigated. Asymptotic null distribution of the test statistic for an individual hypothesis is established and the proposed multiple testing procedure is shown to asymptotically control the false discovery rate (FDR) and false discovery proportion (FDP) at the prespecified level under regularity conditions. Simulations show that the procedure works well in controlling the FDR and has good power in detecting the true interactions. The procedure is applied to a breast cancer gene expression study to identify between pathway interactions. Supplementary materials for this article are available online.</p

    DataSheet1.pdf

    No full text
    <p>Many strains of mice are utilized in mouse models of cerebrovascular diseases. Variations in vascular anatomy between these strains has been documented and may influence the phenotype in stroke models. To address inter-strain variations in the circle of Willis anatomy, the diameters of internal carotid, posterior communicating, anterior cerebral, and middle cerebral arteries in 144 mice from 32 inbred strains were measured. Arterial diameters were analyzed as a function of animal weight, age, and strain. Variations in the structure of the circle of Willis across strains were observed and noted. While right-sided anterior cerebral arteries were significantly greater in diameter than their left-sided counterparts across most strains, variations in arterial diameter are strain specific. Adult mouse weight was not found to be associated with arterial diameter across strains, suggesting that cerebral artery size is associated with strain independently of weight. This study demonstrates strain dependent variations in the murine circle of Willis, which should be taken into consideration when studying mouse models of cerebrovascular diseases.</p

    Federated Offline Reinforcement Learning

    No full text
    Evidence-based or data-driven dynamic treatment regimes are essential for personalized medicine, which can benefit from offline reinforcement learning (RL). Although massive healthcare data are available across medical institutions, they are prohibited from sharing due to privacy constraints. Besides, heterogeneity exists in different sites. As a result, federated offline RL algorithms are necessary and promising to deal with the problems. In this paper, we propose a multi-site Markov decision process model which allows for both homogeneous and heterogeneous effects across sites. The proposed model makes the analysis of the site-level features possible. We design the first federated policy optimization algorithm for offline RL with sample complexity. The proposed algorithm is communication-efficient, which requires only a single round of communication interaction by exchanging summary statistics. We give a theoretical guarantee for the proposed algorithm, where the suboptimality for the learned policies is comparable to the rate as if data is not distributed. Extensive simulations demonstrate the effectiveness of the proposed algorithm. The method is applied to a sepsis dataset in multiple sites to illustrate its use in clinical settings.</p

    Assessing the Most Vulnerable Subgroup to Type II Diabetes Associated with Statin Usage: Evidence from Electronic Health Record Data

    No full text
    There have been increased concerns that the use of statins, one of the most commonly prescribed drugs for treating coronary artery disease, is potentially associated with the increased risk of new-onset type II diabetes (T2D). Nevertheless, to date, there is no robust evidence supporting as to whether and what kind of populations are indeed vulnerable for developing T2D after taking statins. In this case study, leveraging the biobank and electronic health record data in the Partner Health System, we introduce a new data analysis pipeline and a novel statistical methodology that address existing limitations by (i) designing a rigorous causal framework that systematically examines the causal effects of statin usage on T2D risk in observational data, (ii) uncovering which patient subgroup is most vulnerable for developing T2D after taking statins, and (iii) assessing the replicability and statistical significance of the most vulnerable subgroup via a bootstrap calibration procedure. Our proposed approach delivers asymptotically sharp confidence intervals and debiased estimate for the treatment effect of the most vulnerable subgroup in the presence of high-dimensional covariates. With our proposed approach, we find that females with high T2D genetic risk are at the highest risk of developing T2D due to statin usage.</p

    Obstetric outcomes among women with and without suicidal behavior during delivery hospitalizations (N = 23,507,597).

    No full text
    <p>Obstetric outcomes among women with and without suicidal behavior during delivery hospitalizations (N = 23,507,597).</p

    Characteristics of hospitals where women with and without suicidal behavior related-hospitalizations being hospitalized (N = 23,507,597).

    No full text
    <p>Characteristics of hospitals where women with and without suicidal behavior related-hospitalizations being hospitalized (N = 23,507,597).</p

    Socio-demographic and baseline characteristics of women with and without suicidal behavior at delivery hospitalizations (N = 23,507,597).

    No full text
    <p>Socio-demographic and baseline characteristics of women with and without suicidal behavior at delivery hospitalizations (N = 23,507,597).</p

    Clustering sequence data with mixture Markov chains with covariates using multiple simplex constrained optimization routine (MSiCOR)

    No full text
    Mixture Markov Model (MMM) is a widely used tool to cluster sequences of events coming from a finite state-space. However the MMM likelihood being multi-modal, the challenge remains in its maximization. Although Expectation-Maximization (EM) algorithm remains one of the most popular ways to estimate the MMM parameters, however convergence of EM algorithm is not always guaranteed. Given the computational challenges in maximizing the mixture likelihood on the constrained parameter space, we develop a pattern search-based global optimization technique which can optimize any objective function on a collection of simplexes, which is eventually used to maximize MMM likelihood. This is shown to outperform other related global optimization techniques. In simulation experiments, the proposed method is shown to outperform the expectation-maximization (EM) algorithm in the context of MMM estimation performance. The proposed method is applied to cluster Multiple sclerosis (MS) patients based on their treatment sequences of disease-modifying therapies (DMTs). We also propose a novel method to cluster people with MS based on DMT prescriptions and associated clinical features (covariates) using MMM with covariates. Based on the analysis, we divided MS patients into 3 clusters. Further cluster-specific summaries of relevant covariates indicate patient differences among the clusters.</p

    Additional file 1: Figure S1. of Identification of subjects with polycystic ovary syndrome using electronic health records

    No full text
    Datamart calibration. The circles represent A) the initial broad datamart identified using codified data, B) the second refined datamart in which electronic notes with the words polycystic ovary syndrome or PCOS were found, and C) patients from the entire Research Population Data Registry database, without codified exclusion criteria. The overlap represents patients that were found using both codified data and with a PCOS term in the note (AXB) or patients with a PCOS term in the note and without exclusion criteria (BXC). Of note, patients without exclusion criteria are also found in A and AXB, but are not shown here for clarity. The numbers in the orange circles represent the number of charts with a confirmed PCOS diagnosis over the total number of charts reviewed by an expert (CKW) and the percentage confirmed. The white box indicates the patients with evaluable charts who were not included in the broad definition datamart (no codified terms identified) but who did have a PCOS term in their note and were included in the refined datamart. Table S1. ICD 9 codes for diagnoses and procedures and laboratory values used for inclusion and exclusion in the broad PCOS datamart. Patients were all female, 18-74 years of age (current), with any of the listed parameters measured at Massachusetts General Hospital or Brigham and Women’s Hospital. Table S2. Inclusion and exclusion criteria used to create the second refined PCOS datamart. Patients were all female, 18-40 years of age at first identification of any listed parameter from records at Massachusetts General Hospital or Brigham and Women’s Hospital. (DOCX 36 kb
    corecore