62 research outputs found
The Role of Epidermal Enhancer 923 in the Chromatin Architecture and Transcriptional Regulation of the Epidermal Differentiation Complex
The epidermis covers the surface of the skin and provides a functional barrier across the entire body. Epidermal cells or keratinocytes proliferate in the innermost basal layer and migrate upwards into the suprabasal spinous and granular layers as they differentiate, and finally into the terminally differentiated outermost stratum corneum. Keratinocytes undergoing terminal differentiation are marked by tissue-specific concomitant expression of genes encoded in the Epidermal Differentiation Complex (EDC) locus. The EDC genes are organized into four gene families - S100, Sprr, Lce, and Flg-like, which are coordinately expressed upon activation of the terminal differentiation program in keratinocytes. The molecular mechanisms that govern the activation of the EDC during epidermal differentiation are poorly understood. The synteny and colinearity of the locus across multiple mammalian species and the coordinate expression of EDC genes upon keratinocyte differentiation suggest molecular mechanisms operating at the chromatin level. I hypothesize coordinate activation of the EDC by an enhancer regulatory element. Enhancers are non-coding regulatory DNA sequences that upon binding specific transcription factors, are able to increase expression of a proximal or distal target gene. Previous work in our lab identified an epidermal-specific enhancer, CNE 923, that was active in in cell-based luciferase assays and transgenic mice. Here, I examine the function of the 923 enhancer for epidermal differentiation. Using an independent transgenic mouse line, I identified spatiotemporal sensitivity of the 923 enhancer that correlated with the patterning of epidermal barrier formation during mouse embryonic development. To determine if 923 formed chromatin interactions with the EDC gene promoters, I performed chromosome conformation capture (3C) assays in proliferating and differentiated primary mouse keratinocytes. The 3C studies identified physiologically sensitive chromatin interactions between 923 and EDC gene promoters. The data supports a dynamic EDC chromatin topology during keratinocyte differentiation. A requirement for c-Jun/AP-1 in relation to 923-mediated EDC chromatin remodeling for normal EDC gene expression during keratinocyte differentiation was further determined by chromatin immunoprecipitation, 3C, and RNA-seq upon pharmacological inhibition of AP-1 binding. To further determine the function of 923 in vivo, I generated a series of mutation alleles using CRISPR/Cas9 genome editing in mice. Cas9 nuclease activity targeted to the flanking ends of the 923 enhancer in mouse zygotes by a pair of guide RNAs, coupled with homologous recombination-mediated loxP insertions, generated 1 floxed (923flox), 2 independent deletions (923delA, 923delB), and 1 partial deletion (923pdel) alleles for the 923 enhancer. My results from the 923 knockout mice identified decreased expression of nearby Smcp, Lce6a, and involucrin gene expression, decreased distal Crnn and Lce gene family members, and a correlative increase in expression of Sprr gene family members. To identify the chromatin interactions for the 923 enhancer on a genome-wide scale, I performed high-throughput circular chromosome conformation capture (4C-seq) assays with respect to the 923 enhancer and an additional Flg promoter viewpoint in proliferating and differentiated keratinocytes and P5424 T-cells. My results revealed 923 enhancer-mediated chromatin interactions indicative of a topologically associated domain encompassing the EDC. However, an enrichment of 923 mediated chromatin interactions within the EDC, were identified in keratinocytes relative to the T-cells, specifically between the 923 enhancer and the Sprr and Lce gene families, and with non-coding regions in the gene desert between the S100 and Sprr gene families. Of note was a 923 interaction with another putative enhancer near Crct1, enriched specifically in proliferating keratinocytes, and suggesting cross-talk between enhancers. Keratinocyte-specific trans-interactions identified by MACS and GREAT algorithms included genes important for epidermal function including Trp63, an important regulator of keratinocyte differentiation. Together, my 4C-seq identifies unique chromatin architectures of the EDC in keratinocytes and T cells, including keratinocyte-specific enhancer-enhancer crosstalk in cis and interactions between transcriptionally active loci in trans. My studies identify, for the first time, a link between the 923 enhancer and proximal (Ivl, Smcp, Lce6a) and distal genes (Crnn, distal Lce family), the loss of which coincides with upregulation of other epidermal differentiation genes (Sprr family) to maintain skin barrier function. Together, my work has identified 923 as an epidermal-specific enhancer that participates in a chromatin looping network to co-regulate expression of genes important for epidermal development, as a mechanism for maintaining skin barrier integrity
Machine learning for modeling the progression of Alzheimer disease dementia using clinical data: A systematic literature review
OBJECTIVE: Alzheimer disease (AD) is the most common cause of dementia, a syndrome characterized by cognitive impairment severe enough to interfere with activities of daily life. We aimed to conduct a systematic literature review (SLR) of studies that applied machine learning (ML) methods to clinical data derived from electronic health records in order to model risk for progression of AD dementia.
MATERIALS AND METHODS: We searched for articles published between January 1, 2010, and May 31, 2020, in PubMed, Scopus, ScienceDirect, IEEE Explore Digital Library, Association for Computing Machinery Digital Library, and arXiv. We used predefined criteria to select relevant articles and summarized them according to key components of ML analysis such as data characteristics, computational algorithms, and research focus.
RESULTS: There has been a considerable rise over the past 5 years in the number of research papers using ML-based analysis for AD dementia modeling. We reviewed 64 relevant articles in our SLR. The results suggest that majority of existing research has focused on predicting progression of AD dementia using publicly available datasets containing both neuroimaging and clinical data (neurobehavioral status exam scores, patient demographics, neuroimaging data, and laboratory test values).
DISCUSSION: Identifying individuals at risk for progression of AD dementia could potentially help to personalize disease management to plan future care. Clinical data consisting of both structured data tables and clinical notes can be effectively used in ML-based approaches to model risk for AD dementia progression. Data sharing and reproducibility of results can enhance the impact, adaptation, and generalizability of this research
Extraction of clinical phenotypes for Alzheimer\u27s disease dementia from clinical notes using natural language processing
OBJECTIVES: There is much interest in utilizing clinical data for developing prediction models for Alzheimer\u27s disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured electronic health record (EHR) data. However, much critical information resides in relatively inaccessible unstructured clinical notes within the EHR.
MATERIALS AND METHODS: We developed a natural language processing (NLP)-based pipeline to extract AD-related clinical phenotypes, documenting strategies for success and assessing the utility of mining unstructured clinical notes. We evaluated the pipeline against gold-standard manual annotations performed by 2 clinical dementia experts for AD-related clinical phenotypes including medical comorbidities, biomarkers, neurobehavioral test scores, behavioral indicators of cognitive decline, family history, and neuroimaging findings.
RESULTS: Documentation rates for each phenotype varied in the structured versus unstructured EHR. Interannotator agreement was high (Cohen\u27s kappa = 0.72-1) and positively correlated with the NLP-based phenotype extraction pipeline\u27s performance (average F1-score = 0.65-0.99) for each phenotype.
DISCUSSION: We developed an automated NLP-based pipeline to extract informative phenotypes that may improve the performance of eventual machine learning predictive models for AD. In the process, we examined documentation practices for each phenotype relevant to the care of AD patients and identified factors for success.
CONCLUSION: Success of our NLP-based phenotype extraction pipeline depended on domain-specific knowledge and focus on a specific clinical domain instead of maximizing generalizability
Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: A performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy\u27s rule-based and machine learning-based methods
OBJECTIVE: Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients\u27 health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI\u27s Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, and 2 rule-based and machine learning-based methods, namely, scispaCy and medspaCy.
MATERIALS AND METHODS: Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13 646 clinical notes for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, Llama-3-8B, medspaCy, and scispaCy by comparing precision, recall, and micro-F1 scores.
RESULTS: GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, medspaCy, and scispaCy\u27s models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT, Flan-T5, and Llama models were not constrained by explicit rule requirements for contextual pattern recognition. spaCy models relied on predefined patterns, leading to their suboptimal performance.
DISCUSSION AND CONCLUSION: GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction
Pathogenic variants in CRX have distinct cis-regulatory effects on enhancers and silencers in photoreceptors
Dozens of variants in the gene for the homeodomain transcription factor (TF) cone-rod homeobox
Association between socioeconomic factors, race, and use of a specialty memory clinic
BACKGROUND AND OBJECTIVES: The capacity of specialty memory clinics in the United States is very limited. If lower socioeconomic status or minoritized racial group is associated with reduced use of memory clinics, this could exacerbate health care disparities, especially if more effective treatments of Alzheimer disease become available. We aimed to understand how use of a memory clinic is associated with neighborhood-level measures of socioeconomic factors and the intersectionality of race.
METHODS: We conducted an observational cross-sectional study using electronic health record data to compare the neighborhood advantage of patients seen at the Washington University Memory Diagnostic Center with the catchment area using a geographical information system. Furthermore, we compared the severity of dementia at the initial visit between patients who self-identified as Black or White. We used a multinomial logistic regression model to assess the Clinical Dementia Rating at the initial visit and
RESULTS: A total of 4,824 patients seen at the memory clinic between 2008 and 2018 were included in this study (mean age 72.7 [SD 11.0] years, 2,712 [56%] female, 543 [11%] Black). Most of the memory clinic patients lived in more advantaged neighborhoods within the overall catchment area. The percentage of patients self-identifying as Black (11%) was lower than the average percentage of Black individuals by census tract in the catchment area (16%) (
DISCUSSION: This study demonstrates that patients living in less affluent neighborhoods were less likely to be seen in one large memory clinic. Black patients were under-represented in the clinic, and Black patients had more severe dementia at their initial visit. These findings suggest that patients with a lower socioeconomic status and who identify as Black are less likely to be seen in memory clinics, which are likely to be a major point of access for any new Alzheimer disease treatments that may become available
Contribution of copy number variants to schizophrenia from a genome-wide study of 41,321 subjects
Copy number variants (CNVs) have been strongly implicated in the genetic etiology of schizophrenia (SCZ). However, genome-wide investigation of the contribution of CNV to risk has been hampered by limited sample sizes. We sought to address this obstacle by applying a centralized analysis pipeline to a SCZ cohort of 21,094 cases and 20,227 controls. A global enrichment of CNV burden was observed in cases (OR=1.11, P=5.7×10−15), which persisted after excluding loci implicated in previous studies (OR=1.07, P=1.7 ×10−6). CNV burden was enriched for genes associated with synaptic function (OR = 1.68, P = 2.8 ×10−11) and neurobehavioral phenotypes in mouse (OR = 1.18, P= 7.3 ×10−5). Genome-wide significant evidence was obtained for eight loci, including 1q21.1, 2p16.3 (NRXN1), 3q29, 7q11.2, 15q13.3, distal 16p11.2, proximal 16p11.2 and 22q11.2. Suggestive support was found for eight additional candidate susceptibility and protective loci, which consisted predominantly of CNVs mediated by non-allelic homologous recombination
No Reliable Association between Runs of Homozygosity and Schizophrenia in a Well-Powered Replication Study
It is well known that inbreeding increases the risk of recessive monogenic diseases, but it is less certain whether it contributes to the etiology of complex diseases such as schizophrenia. One way to estimate the effects of inbreeding is to examine the association between disease diagnosis and genome-wide autozygosity estimated using runs of homozygosity (ROH) in genome-wide single nucleotide polymorphism arrays. Using data for schizophrenia from the Psychiatric Genomics Consortium (n = 21,868), Keller et al. (2012) estimated that the odds of developing schizophrenia increased by approximately 17% for every additional percent of the genome that is autozygous (β = 16.1, CI(β) = [6.93, 25.7], Z = 3.44, p = 0.0006). Here we describe replication results from 22 independent schizophrenia case-control datasets from the Psychiatric Genomics Consortium (n = 39,830). Using the same ROH calling thresholds and procedures as Keller et al. (2012), we were unable to replicate the significant association between ROH burden and schizophrenia in the independent PGC phase II data, although the effect was in the predicted direction, and the combined (original + replication) dataset yielded an attenuated but significant relationship between Froh and schizophrenia (β = 4.86,CI(β) = [0.90,8.83],Z = 2.40,p = 0.02). Since Keller et al. (2012), several studies reported inconsistent association of ROH burden with complex traits, particularly in case-control data. These conflicting results might suggest that the effects of autozygosity are confounded by various factors, such as socioeconomic status, education, urbanicity, and religiosity, which may be associated with both real inbreeding and the outcome measures of interest
Gene expression imputation across multiple brain regions provides insights into schizophrenia risk
Transcriptomic imputation approaches combine eQTL reference panels with large-scale genotype data in order to test associations between disease and gene expression. These genic associations could elucidate signals in complex genome-wide association study (GWAS) loci and may disentangle the role of different tissues in disease development. We used the largest eQTL reference panel for the dorso-lateral prefrontal cortex (DLPFC) to create a set of gene expression predictors and demonstrate their utility. We applied DLPFC and 12 GTEx-brain predictors to 40,299 schizophrenia cases and 65,264 matched controls for a large transcriptomic imputation study of schizophrenia. We identified 413 genic associations across 13 brain regions. Stepwise conditioning identified 67 non-MHC genes, of which 14 did not fall within previous GWAS loci. We identified 36 significantly enriched pathways, including hexosaminidase-A deficiency, and multiple porphyric disorder pathways. We investigated developmental expression patterns among the 67 non-MHC genes and identified specific groups of pre- and postnatal expression
- …