26 research outputs found
Improving dermatology classifiers across populations using images generated by large diffusion models
Dermatological classification algorithms developed without sufficiently
diverse training data may generalize poorly across populations. While
intentional data collection and annotation offer the best means for improving
representation, new computational approaches for generating training data may
also aid in mitigating the effects of sampling bias. In this paper, we show
that DALLE 2, a large-scale text-to-image diffusion model, can produce
photorealistic images of skin disease across skin types. Using the Fitzpatrick
17k dataset as a benchmark, we demonstrate that augmenting training data with
DALLE 2-generated synthetic images improves classification of skin
disease overall and especially for underrepresented groups.Comment: NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Researc
Model-based analysis of two-color arrays (MA2C)
A normalization method based on probe GC content for two-color tiling arrays and an algorithm for detecting peak regions are presented. They are available in a stand-alone Java program
Enriched protein screening of human bone marrow mesenchymal stromal cell secretions reveals MFAP5 and PENK as novel IL-10 modulators
The secreted proteins from a cell constitute a natural biologic library that can offer significant insight into human health and disease. Discovering new secreted proteins from cells is bounded by the limitations of traditional separation and detection tools to physically fractionate and analyze samples. Here, we present a new method to systematically identify bioactive cell-secreted proteins that circumvent traditional proteomic methods by first enriching for protein candidates by differential gene expression profiling. The bone marrow stromal cell secretome was analyzed using enriched gene expression datasets in combination with potency assay testing. Four proteins expressed by stromal cells with previously unknown anti-inflammatory properties were identified, two of which provided a significant survival benefit to mice challenged with lethal endotoxic shock. Greater than 85% of secreted factors were recaptured that were otherwise undetected by proteomic methods, and remarkable hit rates of 18% in vitro and 9% in vivo were achieved
Adaptation and validation of the ACMG/AMP variant classification framework for MYH7-associated inherited cardiomyopathies: recommendations by ClinGen’s Inherited Cardiomyopathy Expert Panel
Purpose Integrating genomic sequencing in clinical care requires standardization of variant interpretation practices. The Clinical Genome Resource has established expert panels to adapt the American College of Medical Genetics and Genomics/Association for Molecular Pathology classification framework for specific genes and diseases. The Cardiomyopathy Expert Panel selected MYH7, a key contributor to inherited cardiomyopathies, as a pilot gene to develop a broadly applicable approach. Methods: Expert revisions were tested with 60 variants using a structured double review by pairs of clinical and diagnostic laboratory experts. Final consensus rules were established via iterative discussions. Results: Adjustments represented disease-/gene-informed specifications (12) or strength adjustments of existing rules (5). Nine rules were deemed not applicable. Key specifications included quantitative frameworks for minor allele frequency thresholds, the use of segregation data, and a semiquantitative approach to counting multiple independent variant occurrences where fully controlled case-control studies are lacking. Initial inter-expert classification concordance was 93%. Internal data from participating diagnostic laboratories changed the classification of 20% of the variants (n = 12), highlighting the critical importance of data sharing. Conclusion: These adapted rules provide increased specificity for use in MYH7-associated disorders in combination with expert review and clinical judgment and serve as a stepping stone for genes and disorders with similar genetic and clinical characteristics
Statistical foundations for precision medicine
Thesis: Ph. D., Harvard-MIT Program in Health Sciences and Technology, 2015.Cataloged from PDF version of thesis.Includes bibliographical references.Physicians must often diagnose their patients using disease archetypes that are based on symptoms as opposed to underlying pathophysiology. The growing concept of "precision medicine" addresses this challenge by recognizing the vast yet fractured state of biomedical data, and calls for a patient-centered view of data in which molecular, clinical, and environmental measurements are stored in large shareable databases. Such efforts have already enabled large-scale knowledge advancement, but they also risk enabling large-scale misuse. In this thesis, I explore several statistical opportunities and challenges central to clinical decision-making and knowledge advancement with these resources. I use the inherited heart disease hypertrophic cardiomyopathy (HCM) to illustrate these concepts. HCM has proven tractable to genomic sequencing, which guides risk stratification for family members and tailors therapy for some patients. However, these benefits carry risks. I show how genomic misclassifications can disproportionately affect African Americans, amplifying healthcare disparities. These findings highlight the value of diverse population sequencing data, which can prevent variant misclassifications by identifying ancestry informative yet clinically uninformative markers. As decision-making for the individual patient follows from knowledge discovery by the community, I introduce a new quantity called the "dataset positive predictive value" (dPPV) to quantify reproducibility when many research teams separately mine a shared dataset, a growing practice that mirrors genomic testing in scale but not synchrony. I address only a few of the many challenges of delivering sound interpretation of genetic variation in the clinic and the challenges of knowledge discovery with shared "big data." These examples nonetheless serve to illustrate the need for grounded statistical approaches to reliably use these powerful new resources.by Arjun Kumar Manrai.Ph. D
The Geometry of Multisite Phosphorylation
Reversible protein phosphorylation on multiple sites is a key regulatory mechanism in most cellular processes. We consider here a kinase-phosphatase-substrate system with two sites, under mass-action kinetics, with no restrictions on the order of phosphorylation or dephosphorylation. We show that the concentrations of the four phosphoforms at steady state satisfy an algebraic formula—an invariant—that is independent of the other chemical species, such as free enzymes or enzyme-substrate complexes, and holds irrespective of the starting conditions and the total amounts of enzymes and substrate. Such invariants allow stringent quantitative predictions to be made without requiring any knowledge of site-specific parameter values. We introduce what we believe are novel methods from algebraic geometry—Gröbner bases, rational curves—to calculate invariants. These methods are particularly significant because they make it possible to treat parameters symbolically without having to specify their numerical values, and thereby allow us to sidestep the parameter problem. We anticipate that this approach will have much wider applications in biological modeling
Recommended from our members
Systematic correlation of environmental exposure and physiological and self-reported behaviour factors with leukocyte telomere length
Abstract Background: It is hypothesized that environmental exposures and behaviour influence telomere length, an indicator of cellular ageing. We systematically associated 461 indicators of environmental exposures, physiology and self-reported behaviour with telomere length in data from the US National Health and Nutrition Examination Survey (NHANES) in 1999–2002. Further, we tested whether factors identified in the NHANES participants are also correlated with gene expression of telomere length modifying genes. Methods: We correlated 461 environmental exposures, behaviours and clinical variables with telomere length, using survey-weighted linear regression, adjusting for sex, age, age squared, race/ethnicity, poverty level, education and born outside the USA, and estimated the false discovery rate to adjust for multiple hypotheses. We conducted a secondary analysis to investigate the correlation between identified environmental variables and gene expression levels of telomere-associated genes in publicly available gene expression samples. Results: After correlating 461 variables with telomere length, we found 22 variables significantly associated with telomere length after adjustment for multiple hypotheses. Of these varaibales, 14 were associated with longer telomeres, including biomarkers of polychlorinated biphenyls([PCBs; 0.1 to 0.2 standard deviation (SD) increase for 1 SD increase in PCB level, P < 0.002] and a form of vitamin A, retinyl stearate. Eight variables associated with shorter telomeres, including biomarkers of cadmium, C-reactive protein and lack of physical activity. We could not conclude that PCBs are correlated with gene expression of telomere-associated genes. Conclusions: Both environmental exposures and chronic disease-related risk factors may play a role in telomere length. Our secondary analysis found no evidence of association between PCBs/smoking and gene expression of telomere-associated genes. All correlations between exposures, behaviours and clinical factors and changes in telomere length will require further investigation regarding biological influence of exposure
Assessment of hepatic fibrosis in patients with rheumatoid arthritis on long-term methotrexate therapy using transient elastography
Background: Methotrexate (MTX) is has been associated with hepatotoxicity including hepatic fibrosis; however, the incidence of severe hepatic fibrosis or cirrhosis with MTX use has remained a controversial issue. The gold standard test for detecting liver fibrosis has been a liver biopsy, which is an invasive procedure with potentially serious complications. The transient elastography (TE) is a noninvasive method of assessing hepatic fibrosis. The primary objective of this study was to assess the prevalence of hepatic fibrosis associated with long-term MTX therapy in patients with RA and the secondary objective was to assess the correlation of cumulative MTX dose with hepatic fibrosis as assessed by TE using Fibroscan.
Methods: In this cross-sectional study patients with RA who had been on MTX treatment for >5 years were included. Hepatic fibrosis was determined by measuring the hepatic stiffness by TE method (by FibroScan) in kilopascal (kPa) in study patients. The hepatic stiffness of the patient group was compared with that of healthy controls.
Results: A total of 160 patients and 63 healthy controls were included in the study. The mean age of the patients was 51±10.9 years and there were 139 female and 21 male patients.The median duration of MTX use was 317.5 weeks (range 260, 1302 years). Median MTX cumulative dose was 4225 mg (range 2340, 18,200 mg). Mean hepatic stiffness was 4.8 kPa (SD 1.35) in the patient group and 4.7 kPa (SD 1.07) in the control group (P = 0.550). Cumulative dose or duration of MTX treatment did not correlate with hepatic fibrosis.
Conclusions: Severe hepatic fibrosis or cirrhosis as detected by the TE using Fibroscan was uncommon with high cumulative dose of MTX when administered in the low-dose weekly schedule. The cumulative dose of MTX did not correlate with hepatic fibrosis as assessed by FibroScan