Unreported links between trial registrations and published articles were
identified using document similarity measures in a cross-sectional analysis
of ClinicalTrials.gov
Objectives: Trial registries can be used to measure reporting biases and
support systematic reviews but 45% of registrations do not provide a link to
the article reporting on the trial. We evaluated the use of document similarity
methods to identify unreported links between ClinicalTrials.gov and PubMed.
Study Design and Setting: We extracted terms and concepts from a dataset of
72,469 ClinicalTrials.gov registrations and 276,307 PubMed articles, and tested
methods for ranking articles across 16,005 reported links and 90
manually-identified unreported links. Performance was measured by the median
rank of matching articles, and the proportion of unreported links that could be
found by screening ranked candidate articles in order. Results: The best
performing concept-based representation produced a median rank of 3 (IQR 1-21)
for reported links and 3 (IQR 1-19) for the manually-identified unreported
links, and term-based representations produced a median rank of 2 (1-20) for
reported links and 2 (IQR 1-12) in unreported links. The matching article was
ranked first for 40% of registrations, and screening 50 candidate articles per
registration identified 86% of the unreported links. Conclusions: Leveraging
the growth in the corpus of reported links between ClinicalTrials.gov and
PubMed, we found that document similarity methods can assist in the
identification of unreported links between trial registrations and
corresponding articles.Comment: 8 pages, 3 figure