82 research outputs found

    SwISS:A Scalable Markov chain Monte Carlo Divide-and-Conquer Strategy

    Get PDF
    Divide-and-conquer strategies for Monte Carlo algorithms are an increasingly popular approach to making Bayesian inference scalable to large data sets. In its simplest form, the data are partitioned across multiple computing cores and a separate Markov chain Monte Carlo algorithm on each core targets the associated partial posterior distribution, which we refer to as a sub-posterior, that is the posterior given only the data from the segment of the partition associated with that core. Divide-and-conquer techniques reduce computational, memory and disk bottle-necks, but make it difficult to recombine the sub-posterior samples. We propose SwISS: Sub-posteriors with Inflation, Scaling and Shifting; a new approach for recombining the sub-posterior samples which is simple to apply, scales to high-dimensional parameter spaces and accurately approximates the original posterior distribution through affine transformations of the sub-posterior samples. We prove that our transformation is asymptotically optimal across a natural set of affine transformations and illustrate the efficacy of SwISS against competing algorithms on synthetic and real-world data sets

    SwISS:A Scalable Markov chain Monte Carlo Divide-and-Conquer Strategy

    Get PDF
    Divide-and-conquer strategies for Monte Carlo algorithms are an increasingly popular approach to making Bayesian inference scalable to large data sets. In its simplest form, the data are partitioned across multiple computing cores and a separate Markov chain Monte Carlo algorithm on each core targets the associated partial posterior distribution, which we refer to as a sub-posterior, that is the posterior given only the data from the segment of the partition associated with that core. Divide-and-conquer techniques reduce computational, memory and disk bottle-necks, but make it difficult to recombine the sub-posterior samples. We propose SwISS: Sub-posteriors with Inflation, Scaling and Shifting; a new approach for recombining the sub-posterior samples which is simple to apply, scales to high-dimensional parameter spaces and accurately approximates the original posterior distribution through affine transformations of the sub-posterior samples. We prove that our transformation is asymptotically optimal across a natural set of affine transformations and illustrate the efficacy of SwISS against competing algorithms on synthetic and real-world data sets

    Semi-Exact Control Functionals From Sard's Method

    Get PDF
    A novel control variate technique is proposed for post-processing of Markov chain Monte Carlo output, based both on Stein's method and an approach to numerical integration due to Sard. The resulting estimators of posterior expected quantities of interest are proven to be polynomially exact in the Gaussian context, while empirical results suggest the estimators approximate a Gaussian cubature method near the Bernstein-von-Mises limit. The main theoretical result establishes a bias-correction property in settings where the Markov chain does not leave the posterior invariant. Empirical results are presented across a selection of Bayesian inference tasks. All methods used in this paper are available in the R package ZVCV

    Respondent-Driven Sampling in a Study of Drug Users in New York City: Notes from the Field

    Get PDF
    Beth Israel Medical Center (BIMC), in collaboration with the Centers for Disease Control (CDC) and the New York State Department of Health (NYSDOH), used respondent-driven sampling (RDS) in a study of HIV seroprevalence among drug users in New York City in 2004. We report here on operational issues with RDS including recruitment, coupon distribution, storefront operations, police and community relations, and the overall lessons we learned. Project staff recruited eight seeds from a syringe exchange in Lower Manhattan to serve as the initial study participants. Upon completion of the interview that lasted approximately 1 h and a blood draw, each seed was given three coupons to recruit three drug users into the study. Each of the subsequent eligible participants was also given three coupons to recruit three of their drug-using acquaintances. Eligible participants had to have: injected, smoked or snorted an illicit drug in the last 6 months (other than marijuana), aged 18 or older, adequate English language knowledge to permit informed consent and complete questionnaire. From April to July 2004, 618 drug users were interviewed, including 263 (43%) current injectors, 119 (19%) former injectors, and 236 (38%) never injectors. Four hundred sixty nine (76%) participants were men, 147 (24%) were women, and two (<1%) were transgender. By race/ethnicity, 285 (46%) were black, 218 (35%) Hispanic, 88 (14%) white, 23 (4%) mixed/not specified, and four (<1%) native American. Interviews were initially done on a drop-in basis but this system changed to appointments 1 month into the study due to the large volume of subjects coming in for interviews. Data collection was originally proposed to last for 1 year with a target recruitment of 500 drug users. Utilizing RDS, we were able to recruit and interview 118 more drug users than originally proposed in one quarter of the time. RDS was efficient with respect to time and economics (we did not have to hire an outreach worker) and effective in recruiting a diverse sample of drug users

    Using a summary measure for multiple quality indicators in primary care: the Summary QUality InDex (SQUID)

    Get PDF
    BACKGROUND: Assessing the quality of primary care is becoming a priority in national healthcare agendas. Audit and feedback on healthcare quality performance indicators can help improve the quality of care provided. In some instances, fewer numbers of more comprehensive indicators may be preferable. This paper describes the use of the Summary Quality Index (SQUID) in tracking quality of care among patients and primary care practices that use an electronic medical record (EMR). All practices are part of the Practice Partner Research Network, representing over 100 ambulatory care practices throughout the United States. METHODS: The SQUID is comprised of 36 process and outcome measures, all of which are obtained from the EMR. This paper describes algorithms for the SQUID calculations, various statistical properties, and use of the SQUID within the context of a multi-practice quality improvement (QI) project. RESULTS: At any given time point, the patient-level SQUID reflects the proportion of recommended care received, while the practice-level SQUID reflects the average proportion of recommended care received by that practice's patients. Using quarterly reports, practice- and patient-level SQUIDs are provided routinely to practices within the network. The SQUID is responsive, exhibiting highly significant (p < 0.0001) increases during a major QI initiative, and its internal consistency is excellent (Cronbach's alpha = 0.93). Feedback from physicians has been extremely positive, providing a high degree of face validity. CONCLUSION: The SQUID algorithm is feasible and straightforward, and provides a useful QI tool. Its statistical properties and clear interpretation make it appealing to providers, health plans, and researchers

    Mutations in REEP6 Cause Autosomal-Recessive Retinitis Pigmentosa

    Get PDF
    Retinitis pigmentosa (RP) is the most frequent form of inherited retinal dystrophy. RP is genetically heterogeneous and the genes identified to date encode proteins involved in a wide range of functional pathways, including photoreceptor development, phototransduction, the retinoid cycle, cilia, and outer segment development. Here we report the identification of biallelic mutations in Receptor Expression Enhancer Protein 6 (REEP6) in seven individuals with autosomal-recessive RP from five unrelated families. REEP6 is a member of the REEP/Yop1 family of proteins that influence the structure of the endoplasmic reticulum but is relatively unstudied. The six variants identified include three frameshift variants, two missense variants, and a genomic rearrangement that disrupts exon 1. Human 3D organoid optic cups were used to investigate REEP6 expression and confirmed the expression of a retina-specific isoform REEP6.1, which is specifically affected by one of the frameshift mutations. Expression of the two missense variants (c.383C>T [p.Pro128Leu] and c.404T>C [p.Leu135Pro]) and the REEP6.1 frameshift mutant in cultured cells suggest that these changes destabilize the protein. Furthermore, CRISPR-Cas9-mediated gene editing was used to produce Reep6 knock-in mice with the p.Leu135Pro RP-associated variant identified in one RP-affected individual. The homozygous knock-in mice mimic the clinical phenotypes of RP, including progressive photoreceptor degeneration and dysfunction of the rod photoreceptors. Therefore, our study implicates REEP6 in retinal homeostasis and highlights a pathway previously uncharacterized in retinal dystrophy

    Prevalence and architecture of de novo mutations in developmental disorders.

    Get PDF
    The genomes of individuals with severe, undiagnosed developmental disorders are enriched in damaging de novo mutations (DNMs) in developmentally important genes. Here we have sequenced the exomes of 4,293 families containing individuals with developmental disorders, and meta-analysed these data with data from another 3,287 individuals with similar disorders. We show that the most important factors influencing the diagnostic yield of DNMs are the sex of the affected individual, the relatedness of their parents, whether close relatives are affected and the parental ages. We identified 94 genes enriched in damaging DNMs, including 14 that previously lacked compelling evidence of involvement in developmental disorders. We have also characterized the phenotypic diversity among these disorders. We estimate that 42% of our cohort carry pathogenic DNMs in coding sequences; approximately half of these DNMs disrupt gene function and the remainder result in altered protein function. We estimate that developmental disorders caused by DNMs have an average prevalence of 1 in 213 to 1 in 448 births, depending on parental age. Given current global demographics, this equates to almost 400,000 children born per year
    corecore