26 research outputs found

    Improving dermatology classifiers across populations using images generated by large diffusion models

    Full text link
    Dermatological classification algorithms developed without sufficiently diverse training data may generalize poorly across populations. While intentional data collection and annotation offer the best means for improving representation, new computational approaches for generating training data may also aid in mitigating the effects of sampling bias. In this paper, we show that DALLâ‹…\cdotE 2, a large-scale text-to-image diffusion model, can produce photorealistic images of skin disease across skin types. Using the Fitzpatrick 17k dataset as a benchmark, we demonstrate that augmenting training data with DALLâ‹…\cdotE 2-generated synthetic images improves classification of skin disease overall and especially for underrepresented groups.Comment: NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Researc

    Model-based analysis of two-color arrays (MA2C)

    Get PDF
    A normalization method based on probe GC content for two-color tiling arrays and an algorithm for detecting peak regions are presented. They are available in a stand-alone Java program

    Enriched protein screening of human bone marrow mesenchymal stromal cell secretions reveals MFAP5 and PENK as novel IL-10 modulators

    Get PDF
    The secreted proteins from a cell constitute a natural biologic library that can offer significant insight into human health and disease. Discovering new secreted proteins from cells is bounded by the limitations of traditional separation and detection tools to physically fractionate and analyze samples. Here, we present a new method to systematically identify bioactive cell-secreted proteins that circumvent traditional proteomic methods by first enriching for protein candidates by differential gene expression profiling. The bone marrow stromal cell secretome was analyzed using enriched gene expression datasets in combination with potency assay testing. Four proteins expressed by stromal cells with previously unknown anti-inflammatory properties were identified, two of which provided a significant survival benefit to mice challenged with lethal endotoxic shock. Greater than 85% of secreted factors were recaptured that were otherwise undetected by proteomic methods, and remarkable hit rates of 18% in vitro and 9% in vivo were achieved

    Adaptation and validation of the ACMG/AMP variant classification framework for MYH7-associated inherited cardiomyopathies: recommendations by ClinGen’s Inherited Cardiomyopathy Expert Panel

    Get PDF
    Purpose Integrating genomic sequencing in clinical care requires standardization of variant interpretation practices. The Clinical Genome Resource has established expert panels to adapt the American College of Medical Genetics and Genomics/Association for Molecular Pathology classification framework for specific genes and diseases. The Cardiomyopathy Expert Panel selected MYH7, a key contributor to inherited cardiomyopathies, as a pilot gene to develop a broadly applicable approach. Methods: Expert revisions were tested with 60 variants using a structured double review by pairs of clinical and diagnostic laboratory experts. Final consensus rules were established via iterative discussions. Results: Adjustments represented disease-/gene-informed specifications (12) or strength adjustments of existing rules (5). Nine rules were deemed not applicable. Key specifications included quantitative frameworks for minor allele frequency thresholds, the use of segregation data, and a semiquantitative approach to counting multiple independent variant occurrences where fully controlled case-control studies are lacking. Initial inter-expert classification concordance was 93%. Internal data from participating diagnostic laboratories changed the classification of 20% of the variants (n = 12), highlighting the critical importance of data sharing. Conclusion: These adapted rules provide increased specificity for use in MYH7-associated disorders in combination with expert review and clinical judgment and serve as a stepping stone for genes and disorders with similar genetic and clinical characteristics

    Statistical foundations for precision medicine

    No full text
    Thesis: Ph. D., Harvard-MIT Program in Health Sciences and Technology, 2015.Cataloged from PDF version of thesis.Includes bibliographical references.Physicians must often diagnose their patients using disease archetypes that are based on symptoms as opposed to underlying pathophysiology. The growing concept of "precision medicine" addresses this challenge by recognizing the vast yet fractured state of biomedical data, and calls for a patient-centered view of data in which molecular, clinical, and environmental measurements are stored in large shareable databases. Such efforts have already enabled large-scale knowledge advancement, but they also risk enabling large-scale misuse. In this thesis, I explore several statistical opportunities and challenges central to clinical decision-making and knowledge advancement with these resources. I use the inherited heart disease hypertrophic cardiomyopathy (HCM) to illustrate these concepts. HCM has proven tractable to genomic sequencing, which guides risk stratification for family members and tailors therapy for some patients. However, these benefits carry risks. I show how genomic misclassifications can disproportionately affect African Americans, amplifying healthcare disparities. These findings highlight the value of diverse population sequencing data, which can prevent variant misclassifications by identifying ancestry informative yet clinically uninformative markers. As decision-making for the individual patient follows from knowledge discovery by the community, I introduce a new quantity called the "dataset positive predictive value" (dPPV) to quantify reproducibility when many research teams separately mine a shared dataset, a growing practice that mirrors genomic testing in scale but not synchrony. I address only a few of the many challenges of delivering sound interpretation of genetic variation in the clinic and the challenges of knowledge discovery with shared "big data." These examples nonetheless serve to illustrate the need for grounded statistical approaches to reliably use these powerful new resources.by Arjun Kumar Manrai.Ph. D

    The Geometry of Multisite Phosphorylation

    Get PDF
    Reversible protein phosphorylation on multiple sites is a key regulatory mechanism in most cellular processes. We consider here a kinase-phosphatase-substrate system with two sites, under mass-action kinetics, with no restrictions on the order of phosphorylation or dephosphorylation. We show that the concentrations of the four phosphoforms at steady state satisfy an algebraic formula—an invariant—that is independent of the other chemical species, such as free enzymes or enzyme-substrate complexes, and holds irrespective of the starting conditions and the total amounts of enzymes and substrate. Such invariants allow stringent quantitative predictions to be made without requiring any knowledge of site-specific parameter values. We introduce what we believe are novel methods from algebraic geometry—Gröbner bases, rational curves—to calculate invariants. These methods are particularly significant because they make it possible to treat parameters symbolically without having to specify their numerical values, and thereby allow us to sidestep the parameter problem. We anticipate that this approach will have much wider applications in biological modeling

    Assessment of hepatic fibrosis in patients with rheumatoid arthritis on long-term methotrexate therapy using transient elastography

    No full text
    Background: Methotrexate (MTX) is has been associated with hepatotoxicity including hepatic fibrosis; however, the incidence of severe hepatic fibrosis or cirrhosis with MTX use has remained a controversial issue. The gold standard test for detecting liver fibrosis has been a liver biopsy, which is an invasive procedure with potentially serious complications. The transient elastography (TE) is a noninvasive method of assessing hepatic fibrosis. The primary objective of this study was to assess the prevalence of hepatic fibrosis associated with long-term MTX therapy in patients with RA and the secondary objective was to assess the correlation of cumulative MTX dose with hepatic fibrosis as assessed by TE using Fibroscan. Methods: In this cross-sectional study patients with RA who had been on MTX treatment for >5 years were included. Hepatic fibrosis was determined by measuring the hepatic stiffness by TE method (by FibroScan) in kilopascal (kPa) in study patients. The hepatic stiffness of the patient group was compared with that of healthy controls. Results: A total of 160 patients and 63 healthy controls were included in the study. The mean age of the patients was 51±10.9 years and there were 139 female and 21 male patients.The median duration of MTX use was 317.5 weeks (range 260, 1302 years). Median MTX cumulative dose was 4225 mg (range 2340, 18,200 mg). Mean hepatic stiffness was 4.8 kPa (SD 1.35) in the patient group and 4.7 kPa (SD 1.07) in the control group (P = 0.550). Cumulative dose or duration of MTX treatment did not correlate with hepatic fibrosis. Conclusions: Severe hepatic fibrosis or cirrhosis as detected by the TE using Fibroscan was uncommon with high cumulative dose of MTX when administered in the low-dose weekly schedule. The cumulative dose of MTX did not correlate with hepatic fibrosis as assessed by FibroScan

    Potential Excessive Testing at Scale

    No full text
    corecore