5 research outputs found

    Check-COVID: Fact-Checking COVID-19 News Claims with Scientific Evidence

    Full text link
    We present a new fact-checking benchmark, Check-COVID, that requires systems to verify claims about COVID-19 from news using evidence from scientific articles. This approach to fact-checking is particularly challenging as it requires checking internet text written in everyday language against evidence from journal articles written in formal academic language. Check-COVID contains 1, 504 expert-annotated news claims about the coronavirus paired with sentence-level evidence from scientific journal articles and veracity labels. It includes both extracted (journalist-written) and composed (annotator-written) claims. Experiments using both a fact-checking specific system and GPT-3.5, which respectively achieve F1 scores of 76.99 and 69.90 on this task, reveal the difficulty of automatically fact-checking both claim types and the importance of in-domain data for good performance. Our data and models are released publicly at https://github.com/posuer/Check-COVID.Comment: Accepted as ACL 2023 Finding

    State-of-the-art methods for exposure-health studies: Results from the exposome data challenge event

    Get PDF
    The exposome recognizes that individuals are exposed simultaneously to a multitude of different environmental factors and takes a holistic approach to the discovery of etiological factors for disease. However, challenges arise when trying to quantify the health effects of complex exposure mixtures. Analytical challenges include dealing with high dimensionality, studying the combined effects of these exposures and their interactions, integrating causal pathways, and integrating high-throughput omics layers. To tackle these challenges, the Barcelona Institute for Global Health (ISGlobal) held a data challenge event open to researchers from all over the world and from all expertises. Analysts had a chance to compete and apply state-of-the-art methods on a common partially simulated exposome dataset (based on real case data from the HELIX project) with multiple correlated exposure variables (P > 100 exposure variables) arising from general and personal environments at different time points, biological molecular data (multi-omics: DNA methylation, gene expression, proteins, metabolomics) and multiple clinical phenotypes in 1301 mother–child pairs. Most of the methods presented included feature selection or feature reduction to deal with the high dimensionality of the exposome dataset. Several approaches explicitly searched for combined effects of exposures and/or their interactions using linear index models or response surface methods, including Bayesian methods. Other methods dealt with the multi-omics dataset in mediation analyses using multiple-step approaches. Here we discuss features of the statistical models used and provide the data and codes used, so that analysts have examples of implementation and can learn how to use these methods. Overall, the exposome data challenge presented a unique opportunity for researchers from different disciplines to create and share state-of-the-art analytical methods, setting a new standard for open science in the exposome and environmental health field

    Additional file 1: Figure S1. of Effect of personal exposure to black carbon on changes in allergic asthma gene methylation measured 5 days later in urban children: importance of allergic sensitization

    No full text
    Conserved promoter regions. Black lines mark loci that are conserved between human and mouse in the promoter region of IL4, IFNγ, and ARG2. White areas are not conserved. Conserved regions were identified using Standard Nucleotide BLAST (blastn for more dissimilar regions; https://blast.ncbi.nlm.nih.gov/Blast.cgi.) for the 400 nucleotides upstream of the transcriptional start site (TSS) in the human sequence. The NOS2A promoter region under investigation is not conserved between mice and human. Figure S2: Schematic demonstration of collected measures. Numbers in the box represent the number of participants. N:n = number of repeat subjects: number of observations. Grey dotted box indicates two measures (both time 1 and time 2, 6 months apart) available and white box only one measure (Time 1) available. N = 10 participants dropped due to invalid personal or residential air pollution measures. N = 17 participants were further excluded from the analysis due to missing total IgE (N = 16) and invalid DNA methylation due to technical failures in the laboratory (N = 1), resulting in N = 136 of the final sample size. Figure S3: Correlations between day 1 and day 6 buccal cell DNA methylations of (a) IL4 (CpG−326,CpG−48, (b) IFNγ (CpG−186,CpG−54), and (c) NOS2A (CpG+5099, CpG+5106) and (d) ARG2 (average methylation of CpG−32, CpG−30, and CpG−26), Spearman correlation coefficient presented. (DOCX 466 kb
    corecore