Semantic indexing of biomedical literature is usually done at the level of
MeSH descriptors, representing topics of interest for the biomedical community.
Several related but distinct biomedical concepts are often grouped together in
a single coarse-grained descriptor and are treated as a single topic for
semantic indexing. This study proposes a new method for the automated
refinement of subject annotations at the level of concepts, investigating deep
learning approaches. Lacking labelled data for this task, our method relies on
weak supervision based on concept occurrence in the abstract of an article. The
proposed approach is evaluated on an extended large-scale retrospective
scenario, taking advantage of concepts that eventually become MeSH descriptors,
for which annotations become available in MEDLINE/PubMed. The results suggest
that concept occurrence is a strong heuristic for automated subject annotation
refinement and can be further enhanced when combined with dictionary-based
heuristics. In addition, such heuristics can be useful as weak supervision for
developing deep learning models that can achieve further improvement in some
cases.Comment: 48 pages, 5 figures, 9 tables, 1 algorith