In evidence-based medicine, relevance of medical literature is determined by
predefined relevance conditions. The conditions are defined based on PICO
elements, namely, Patient, Intervention, Comparator, and Outcome. Hence, PICO
annotations in medical literature are essential for automatic relevant document
filtering. However, defining boundaries of text spans for PICO elements is not
straightforward. In this paper, we study the agreement of PICO annotations made
by multiple human annotators, including both experts and non-experts.
Agreements are estimated by a standard span agreement (i.e., matching both
labels and boundaries of text spans), and two types of relaxed span agreement
(i.e., matching labels without guaranteeing matching boundaries of spans).
Based on the analysis, we report two observations: (i) Boundaries of PICO span
annotations by individual human annotators are very diverse. (ii) Despite the
disagreement in span boundaries, general areas of the span annotations are
broadly agreed by annotators. Our results suggest that applying a standard
agreement alone may undermine the agreement of PICO spans, and adopting both a
standard and a relaxed agreements is more suitable for PICO span evaluation.Comment: Accepted in SIGIR 2019 (Short paper