9 research outputs found

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Global burden and strength of evidence for 88 risk factors in 204 countries and 811 subnational locations, 1990–2021: a systematic analysis for the Global Burden of Disease Study 2021

    Get PDF
    Background: Understanding the health consequences associated with exposure to risk factors is necessary to inform public health policy and practice. To systematically quantify the contributions of risk factor exposures to specific health outcomes, the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2021 aims to provide comprehensive estimates of exposure levels, relative health risks, and attributable burden of disease for 88 risk factors in 204 countries and territories and 811 subnational locations, from 1990 to 2021. Methods: The GBD 2021 risk factor analysis used data from 54 561 total distinct sources to produce epidemiological estimates for 88 risk factors and their associated health outcomes for a total of 631 risk–outcome pairs. Pairs were included on the basis of data-driven determination of a risk–outcome association. Age-sex-location-year-specific estimates were generated at global, regional, and national levels. Our approach followed the comparative risk assessment framework predicated on a causal web of hierarchically organised, potentially combinative, modifiable risks. Relative risks (RRs) of a given outcome occurring as a function of risk factor exposure were estimated separately for each risk–outcome pair, and summary exposure values (SEVs), representing risk-weighted exposure prevalence, and theoretical minimum risk exposure levels (TMRELs) were estimated for each risk factor. These estimates were used to calculate the population attributable fraction (PAF; ie, the proportional change in health risk that would occur if exposure to a risk factor were reduced to the TMREL). The product of PAFs and disease burden associated with a given outcome, measured in disability-adjusted life-years (DALYs), yielded measures of attributable burden (ie, the proportion of total disease burden attributable to a particular risk factor or combination of risk factors). Adjustments for mediation were applied to account for relationships involving risk factors that act indirectly on outcomes via intermediate risks. Attributable burden estimates were stratified by Socio-demographic Index (SDI) quintile and presented as counts, age-standardised rates, and rankings. To complement estimates of RR and attributable burden, newly developed burden of proof risk function (BPRF) methods were applied to yield supplementary, conservative interpretations of risk–outcome associations based on the consistency of underlying evidence, accounting for unexplained heterogeneity between input data from different studies. Estimates reported represent the mean value across 500 draws from the estimate's distribution, with 95% uncertainty intervals (UIs) calculated as the 2·5th and 97·5th percentile values across the draws. Findings: Among the specific risk factors analysed for this study, particulate matter air pollution was the leading contributor to the global disease burden in 2021, contributing 8·0% (95% UI 6·7–9·4) of total DALYs, followed by high systolic blood pressure (SBP; 7·8% [6·4–9·2]), smoking (5·7% [4·7–6·8]), low birthweight and short gestation (5·6% [4·8–6·3]), and high fasting plasma glucose (FPG; 5·4% [4·8–6·0]). For younger demographics (ie, those aged 0–4 years and 5–14 years), risks such as low birthweight and short gestation and unsafe water, sanitation, and handwashing (WaSH) were among the leading risk factors, while for older age groups, metabolic risks such as high SBP, high body-mass index (BMI), high FPG, and high LDL cholesterol had a greater impact. From 2000 to 2021, there was an observable shift in global health challenges, marked by a decline in the number of all-age DALYs broadly attributable to behavioural risks (decrease of 20·7% [13·9–27·7]) and environmental and occupational risks (decrease of 22·0% [15·5–28·8]), coupled with a 49·4% (42·3–56·9) increase in DALYs attributable to metabolic risks, all reflecting ageing populations and changing lifestyles on a global scale. Age-standardised global DALY rates attributable to high BMI and high FPG rose considerably (15·7% [9·9–21·7] for high BMI and 7·9% [3·3–12·9] for high FPG) over this period, with exposure to these risks increasing annually at rates of 1·8% (1·6–1·9) for high BMI and 1·3% (1·1–1·5) for high FPG. By contrast, the global risk-attributable burden and exposure to many other risk factors declined, notably for risks such as child growth failure and unsafe water source, with age-standardised attributable DALYs decreasing by 71·5% (64·4–78·8) for child growth failure and 66·3% (60·2–72·0) for unsafe water source. We separated risk factors into three groups according to trajectory over time: those with a decreasing attributable burden, due largely to declining risk exposure (eg, diet high in trans-fat and household air pollution) but also to proportionally smaller child and youth populations (eg, child and maternal malnutrition); those for which the burden increased moderately in spite of declining risk exposure, due largely to population ageing (eg, smoking); and those for which the burden increased considerably due to both increasing risk exposure and population ageing (eg, ambient particulate matter air pollution, high BMI, high FPG, and high SBP). Interpretation: Substantial progress has been made in reducing the global disease burden attributable to a range of risk factors, particularly those related to maternal and child health, WaSH, and household air pollution. Maintaining efforts to minimise the impact of these risk factors, especially in low SDI locations, is necessary to sustain progress. Successes in moderating the smoking-related burden by reducing risk exposure highlight the need to advance policies that reduce exposure to other leading risk factors such as ambient particulate matter air pollution and high SBP. Troubling increases in high FPG, high BMI, and other risk factors related to obesity and metabolic syndrome indicate an urgent need to identify and implement interventions. Funding: Bill & Melinda Gates Foundation

    Pan-cancer analysis of whole genomes

    No full text
    Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation; analyses timings and patterns of tumour evolution; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity; and evaluates a range of more-specialized features of cancer genomes
    corecore