25 research outputs found

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Combining current knowledge of Cypripedium calceolus with a new analysis of genetic variation in Italian populations to provide guidelines for conservation actions

    Get PDF
    The split between conservation science and real-world application is an ongoing issue despite several calls for unification. Researchers are empowered to partially bridge the research-implementation gap by making their findings more accessible. Cypripedium calceolus is the most recognizable orchid of the European flora, and is currently facing habitat change and fragmentation, in addition to threats from collectors and illegal traders. Although several studies have focused on the ecological and genetic features of the species, a comprehensive account of how such aspects can be translated into concrete conservation recommendations is still missing. In this study, we describe microsatellite genetic variation in 188 individuals from different Italian populations of C. calceolus. Our results indicate the need for immediate conservation action for the most isolated populations in the Central Apennines and north-western Italy. Although our genetic findings are specific to the Italian populations, our aim is to review ecological and population genetic aspects in C. calceolus and their implications for conservation against the existing threats. Therefore, our detailed guidelines for translocation, habitat management and post-translocation monitoring can be used to inform conservation strategies in threatened populations of C. calceolus across its range.The genetic part of the present study was partially funded by Natural England (UK)

    The application of the Ten Group classification system (TGCS) in caesarean delivery case mix adjustment. A multicenter prospective study.

    Get PDF
    BACKGROUND: Caesarean delivery (CD) rates are commonly used as an indicator of quality in obstetric care and risk adjustment evaluation is recommended to assess inter-institutional variations. The aim of this study was to evaluate whether the Ten Group classification system (TGCS) can be used in case-mix adjustment. METHODS: Standardized data on 15,255 deliveries from 11 different regional centers were prospectively collected. Crude Risk Ratios of CDs were calculated for each center. Two multiple logistic regression models were herein considered by using: Model 1- maternal (age, Body Mass Index), obstetric variables (gestational age, fetal presentation, single or multiple, previous scar, parity, neonatal birth weight) and presence of risk factors; Model 2- TGCS either with or without maternal characteristics and presence of risk factors. Receiver Operating Characteristic (ROC) curves of the multivariate logistic regression analyses were used to assess the diagnostic accuracy of each model. The null hypothesis that Areas under ROC Curve (AUC) were not different from each other was verified with a Chi Square test and post hoc pairwise comparisons by using a Bonferroni correction. RESULTS: Crude evaluation of CD rates showed all centers had significantly higher Risk Ratios than the referent. Both multiple logistic regression models reduced these variations. However the two methods ranked institutions differently: model 1 and model 2 (adjusted for TGCS) identified respectively nine and eight centers with significantly higher CD rates than the referent with slightly different AUCs (0.8758 and 0.8929 respectively). In the adjusted model for TGCS and maternal characteristics/presence of risk factors, three centers had CD rates similar to the referent with the best AUC (0.9024). CONCLUSIONS: The TGCS might be considered as a reliable variable to adjust CD rates. The addition of maternal characteristics and risk factors to TGCS substantially increase the predictive discrimination of the risk adjusted model
    corecore