566 research outputs found

    A guide to evaluating linkage quality for the analysis of linked data.

    Get PDF
    Linked datasets are an important resource for epidemiological and clinical studies, but linkage error can lead to biased results. For data security reasons, linkage of personal identifiers is often performed by a third party, making it difficult for researchers to assess the quality of the linked dataset in the context of specific research questions. This is compounded by a lack of guidance on how to determine the potential impact of linkage error. We describe how linkage quality can be evaluated and provide widely applicable guidance for both data providers and researchers. Using an illustrative example of a linked dataset of maternal and baby hospital records, we demonstrate three approaches for evaluating linkage quality: applying the linkage algorithm to a subset of gold standard data to quantify linkage error; comparing characteristics of linked and unlinked data to identify potential sources of bias; and evaluating the sensitivity of results to changes in the linkage procedure. These approaches can inform our understanding of the potential impact of linkage error and provide an opportunity to select the most appropriate linkage procedure for a specific analysis. Evaluating linkage quality in this way will improve the quality and transparency of epidemiological and clinical research using linked data

    MRI-localized biopsies reveal subtype-specific differences in molecular and cellular composition at the margins of glioblastoma

    Get PDF
    Glioblastomas (GBMs) diffusely infiltrate the brain, making complete removal by surgical resection impossible. The mixture of neoplastic and nonneoplastic cells that remain after surgery form the biological context for adjuvant therapeutic intervention and recurrence. We performed RNA-sequencing (RNA-seq) and histological analysis on radiographically guided biopsies taken from different regions of GBM and showed that the tissue contained within the contrast-enhancing (CE) core of tumors have different cellular and molecular compositions compared with tissue from the nonenhancing (NE) margins of tumors. Comparisons with the The Cancer Genome Atlas dataset showed that the samples from CE regions resembled the proneural, classical, or mesenchymal subtypes of GBM, whereas the samples from the NE regions predominantly resembled the neural subtype. Computational deconvolution of the RNA-seq data revealed that contributions from nonneoplastic brain cells significantly influence the expression pattern in the NE samples. Gene ontology analysis showed that the cell type-specific expression patterns were functionally distinct and highly enriched in genes associated with the corresponding cell phenotypes. Comparing the RNA-seq data from the GBM samples to that of nonneoplastic brain revealed that the differentially expressed genes are distributed across multiple cell types. Notably, the patterns of cell type-specific alterations varied between the different GBM subtypes: the NE regions of proneural tumors were enriched in oligodendrocyte progenitor genes, whereas the NE regions of mesenchymal GBM were enriched in astrocytic and microglial genes. These subtypespecific patterns provide new insights into molecular and cellular composition of the infiltrative margins of GBM

    Labeling poststorm coastal imagery for machine learning: measurement of interrater agreement

    Get PDF
    © The Author(s), 2021. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Goldstein, E. B., Buscombe, D., Lazarus, E. D., Mohanty, S. D., Rafique, S. N., Anarde, K. A., Ashton, A. D., Beuzen, T., Castagno, K. A., Cohn, N., Conlin, M. P., Ellenson, A., Gillen, M., Hovenga, P. A., Over, J.-S. R., Palermo, R., Ratliff, K. M., Reeves, I. R. B., Sanborn, L. H., Straub, J. A., Taylor, L. A., Wallace E. J., Warrick, J., Wernette, P., Williams, H. E. Labeling poststorm coastal imagery for machine learning: measurement of interrater agreement. Earth and Space Science, 8(9), (2021): e2021EA001896, https://doi.org/10.1029/2021EA001896.Classifying images using supervised machine learning (ML) relies on labeled training data—classes or text descriptions, for example, associated with each image. Data-driven models are only as good as the data used for training, and this points to the importance of high-quality labeled data for developing a ML model that has predictive skill. Labeling data is typically a time-consuming, manual process. Here, we investigate the process of labeling data, with a specific focus on coastal aerial imagery captured in the wake of hurricanes that affected the Atlantic and Gulf Coasts of the United States. The imagery data set is a rich observational record of storm impacts and coastal change, but the imagery requires labeling to render that information accessible. We created an online interface that served labelers a stream of images and a fixed set of questions. A total of 1,600 images were labeled by at least two or as many as seven coastal scientists. We used the resulting data set to investigate interrater agreement: the extent to which labelers labeled each image similarly. Interrater agreement scores, assessed with percent agreement and Krippendorff's alpha, are higher when the questions posed to labelers are relatively simple, when the labelers are provided with a user manual, and when images are smaller. Experiments in interrater agreement point toward the benefit of multiple labelers for understanding the uncertainty in labeling data for machine learning research.The authors gratefully acknowledge support from the U.S. Geological Survey (G20AC00403 to EBG and SDM), NSF (1953412 to EBG and SDM; 1939954 to EBG), Microsoft AI for Earth (to EBG and SDM), The Leverhulme Trust (RPG-2018-282 to EDL and EBG), and an Early Career Research Fellowship from the Gulf Research Program of the National Academies of Sciences, Engineering, and Medicine (to EBG). U.S. Geological Survey researchers (DB, J-SRO, JW, and PW) were supported by the U.S. Geological Survey Coastal and Marine Hazards and Resources Program as part of the response and recovery efforts under congressional appropriations through the Additional Supplemental Appropriations for Disaster Relief Act, 2019 (Public Law 116-20; 133 Stat. 871)

    "May I Buy a Pack of Marlboros, Please?" A Systematic Review of Evidence to Improve the Validity and Impact of Youth Undercover Buy Inspections

    Get PDF
    Most smokers become addicted to tobacco products before they are legally able to pur- chase these products. We systematically reviewed the literature on protocols to assess underage purchase and their ecological validity. We conducted a systematic search in May 2015 in PubMed and PsycINFO. We independently screened records for inclusion. We con- ducted a narrative review and examined implications of two types of legal authority for proto- cols that govern underage buy enforcement in the United States: criminal (state-level laws prohibiting sales to youth) and administrative (federal regulations prohibiting sales to youth). Ten studies experimentally assessed underage buy protocols and 44 studies assessed the association between youth characteristics and tobacco sales. Protocols that mimicked real-world youth behaviors were consistently associated with substantially greater likelihood of a sale to a youth. Many of the tested protocols appear to be designed for compliance with criminal law rather than administrative enforcement in ways that limited ecological validity. This may be due to concerns about entrapment. For administrative enforcement in particular, entrapment may be less of an issue than commonly thought. Commonly used underage buy protocols poorly represent the reality of youths' access to tobacco from retailers. Compliance check programs should allow youth to present them- selves naturally and attempt to match the community’s demographic makeup

    Identification of Novel Genes and Pathways Regulating SREBP Transcriptional Activity

    Get PDF
    BACKGROUND: Lipid metabolism in mammals is orchestrated by a family of transcription factors called sterol regulatory element-binding proteins (SREBPs) that control the expression of genes required for the uptake and synthesis of cholesterol, fatty acids, and triglycerides. SREBPs are thus essential for insulin-induced lipogenesis and for cellular membrane homeostasis and biogenesis. Although multiple players have been identified that control the expression and activation of SREBPs, gaps remain in our understanding of how SREBPs are coordinated with other physiological pathways. METHODOLOGY: To identify novel regulators of SREBPs, we performed a genome-wide cDNA over-expression screen to identify proteins that might modulate the transcription of a luciferase gene driven from an SREBP-specific promoter. The results were verified through secondary biological assays and expression data were analyzed by a novel application of the Gene Set Enrichment Analysis (GSEA) method. CONCLUSIONS/SIGNIFICANCE: We screened 10,000 different cDNAs and identified a number of genes and pathways that have previously not been implicated in SREBP control and cellular cholesterol homeostasis. These findings further our understanding of lipid biology and should lead to new insights into lipid associated disorders

    Post translational changes to α-synuclein control iron and dopamine trafficking : a concept for neuron vulnerability in Parkinson's disease

    Get PDF
    Parkinson's disease is a multifactorial neurodegenerative disorder, the aetiology of which remains elusive. The primary clinical feature of progressively impaired motor control is caused by a loss of midbrain substantia nigra dopamine neurons that have a high α-synuclein (α-syn) and iron content. α-Syn is a neuronal protein that is highly modified post-translationally and central to the Lewy body neuropathology of the disease. This review provides an overview of findings on the role post translational modifications to α-syn have in membrane binding and intracellular vesicle trafficking. Furthermore, we propose a concept in which acetylation and phosphorylation of α-syn modulate endocytic import of iron and vesicle transport of dopamine during normal physiology. Disregulated phosphorylation and oxidation of α-syn mediate iron and dopamine dependent oxidative stress through impaired cellular location and increase propensity for α-syn aggregation. The proposition highlights a connection between α-syn, iron and dopamine, three pathological components associated with disease progression in sporadic Parkinson's disease

    Genome-wide Analyses Identify KIF5A as a Novel ALS Gene

    Get PDF
    To identify novel genes associated with ALS, we undertook two lines of investigation. We carried out a genome-wide association study comparing 20,806 ALS cases and 59,804 controls. Independently, we performed a rare variant burden analysis comparing 1,138 index familial ALS cases and 19,494 controls. Through both approaches, we identified kinesin family member 5A (KIF5A) as a novel gene associated with ALS. Interestingly, mutations predominantly in the N-terminal motor domain of KIF5A are causative for two neurodegenerative diseases: hereditary spastic paraplegia (SPG10) and Charcot-Marie-Tooth type 2 (CMT2). In contrast, ALS-associated mutations are primarily located at the C-terminal cargo-binding tail domain and patients harboring loss-of-function mutations displayed an extended survival relative to typical ALS cases. Taken together, these results broaden the phenotype spectrum resulting from mutations in KIF5A and strengthen the role of cytoskeletal defects in the pathogenesis of ALS.Peer reviewe

    New genetic loci link adipose and insulin biology to body fat distribution.

    Get PDF
    Body fat distribution is a heritable trait and a well-established predictor of adverse metabolic outcomes, independent of overall adiposity. To increase our understanding of the genetic basis of body fat distribution and its molecular links to cardiometabolic traits, here we conduct genome-wide association meta-analyses of traits related to waist and hip circumferences in up to 224,459 individuals. We identify 49 loci (33 new) associated with waist-to-hip ratio adjusted for body mass index (BMI), and an additional 19 loci newly associated with related waist and hip circumference measures (P < 5 × 10(-8)). In total, 20 of the 49 waist-to-hip ratio adjusted for BMI loci show significant sexual dimorphism, 19 of which display a stronger effect in women. The identified loci were enriched for genes expressed in adipose tissue and for putative regulatory elements in adipocytes. Pathway analyses implicated adipogenesis, angiogenesis, transcriptional regulation and insulin resistance as processes affecting fat distribution, providing insight into potential pathophysiological mechanisms

    SARS-CoV-2 susceptibility and COVID-19 disease severity are associated with genetic variants affecting gene expression in a variety of tissues

    Get PDF
    Variability in SARS-CoV-2 susceptibility and COVID-19 disease severity between individuals is partly due to genetic factors. Here, we identify 4 genomic loci with suggestive associations for SARS-CoV-2 susceptibility and 19 for COVID-19 disease severity. Four of these 23 loci likely have an ethnicity-specific component. Genome-wide association study (GWAS) signals in 11 loci colocalize with expression quantitative trait loci (eQTLs) associated with the expression of 20 genes in 62 tissues/cell types (range: 1:43 tissues/gene), including lung, brain, heart, muscle, and skin as well as the digestive system and immune system. We perform genetic fine mapping to compute 99% credible SNP sets, which identify 10 GWAS loci that have eight or fewer SNPs in the credible set, including three loci with one single likely causal SNP. Our study suggests that the diverse symptoms and disease severity of COVID-19 observed between individuals is associated with variants across the genome, affecting gene expression levels in a wide variety of tissue types
    corecore