3 research outputs found

    A distantly supervised dataset for automated data extraction from diagnostic studies

    Get PDF
    International audienceSystematic reviews are important in evidencebased medicine, but are expensive to produce.Automating or semi-automating the data extractionof index test, target condition, and referencestandard from articles has the potentialto decrease the cost of conducting systematicreviews of diagnostic test accuracy, but relevanttraining data is not available. We create adistantly supervised dataset of approximately90,000 sentences, and let two experts manuallyannotate a small subset of around 1,000sentences for evaluation. We evaluate the performanceof BioBERT and logistic regressionfor ranking the sentences, and compare theperformance for distant and direct supervision.Our results suggest that distant supervision canwork as well as, or better than direct supervisionon this problem, and that distantly trainedmodels can perform as well as, or better thanhuman annotators
    corecore