10 research outputs found

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Rate of Detection of Advanced Neoplasms in Proximal Colon by Simulated Sigmoidoscopy vs Fecal Immunochemical Tests

    No full text

    NEOTROPICAL ALIEN MAMMALS: a data set of occurrence and abundance of alien mammals in the Neotropics

    No full text
    Biological invasion is one of the main threats to native biodiversity. For a species to become invasive, it must be voluntarily or involuntarily introduced by humans into a nonnative habitat. Mammals were among first taxa to be introduced worldwide for game, meat, and labor, yet the number of species introduced in the Neotropics remains unknown. In this data set, we make available occurrence and abundance data on mammal species that (1) transposed a geographical barrier and (2) were voluntarily or involuntarily introduced by humans into the Neotropics. Our data set is composed of 73,738 historical and current georeferenced records on alien mammal species of which around 96% correspond to occurrence data on 77 species belonging to eight orders and 26 families. Data cover 26 continental countries in the Neotropics, ranging from Mexico and its frontier regions (southern Florida and coastal-central Florida in the southeast United States) to Argentina, Paraguay, Chile, and Uruguay, and the 13 countries of Caribbean islands. Our data set also includes neotropical species (e.g., Callithrix sp., Myocastor coypus, Nasua nasua) considered alien in particular areas of Neotropics. The most numerous species in terms of records are from Bos sp. (n = 37,782), Sus scrofa (n = 6,730), and Canis familiaris (n = 10,084); 17 species were represented by only one record (e.g., Syncerus caffer, Cervus timorensis, Cervus unicolor, Canis latrans). Primates have the highest number of species in the data set (n = 20 species), partly because of uncertainties regarding taxonomic identification of the genera Callithrix, which includes the species Callithrix aurita, Callithrix flaviceps, Callithrix geoffroyi, Callithrix jacchus, Callithrix kuhlii, Callithrix penicillata, and their hybrids. This unique data set will be a valuable source of information on invasion risk assessments, biodiversity redistribution and conservation-related research. There are no copyright restrictions. Please cite this data paper when using the data in publications. We also request that researchers and teachers inform us on how they are using the data

    Impact of age- and gender-specific cut-off values for the fecal immunochemical test for hemoglobin in colorectal cancer screening

    No full text

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    No full text

    International Nosocomial Infection Control Consortiu (INICC) report, data summary of 43 countries for 2007-2012. Device-associated module

    No full text
    We report the results of an International Nosocomial Infection Control Consortium (INICC) surveillance study from January 2007-December 2012 in 503 intensive care units (ICUs) in Latin America, Asia, Africa, and Europe. During the 6-year study using the Centers for Disease Control and Prevention's (CDC) U.S. National Healthcare Safety Network (NHSN) definitions for device-associated health care–associated infection (DA-HAI), we collected prospective data from 605,310 patients hospitalized in the INICC's ICUs for an aggregate of 3,338,396 days. Although device utilization in the INICC's ICUs was similar to that reported from ICUs in the U.S. in the CDC's NHSN, rates of device-associated nosocomial infection were higher in the ICUs of the INICC hospitals: the pooled rate of central line–associated bloodstream infection in the INICC's ICUs, 4.9 per 1,000 central line days, is nearly 5-fold higher than the 0.9 per 1,000 central line days reported from comparable U.S. ICUs. The overall rate of ventilator-associated pneumonia was also higher (16.8 vs 1.1 per 1,000 ventilator days) as was the rate of catheter-associated urinary tract infection (5.5 vs 1.3 per 1,000 catheter days). Frequencies of resistance of Pseudomonas isolates to amikacin (42.8% vs 10%) and imipenem (42.4% vs 26.1%) and Klebsiella pneumoniae isolates to ceftazidime (71.2% vs 28.8%) and imipenem (19.6% vs 12.8%) were also higher in the INICC's ICUs compared with the ICUs of the CDC's NHSN

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    No full text
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical science. © The Author(s) 2019. Published by Oxford University Press
    corecore