Structured chemical reaction information plays a vital role for chemists
engaged in laboratory work and advanced endeavors such as computer-aided drug
design. Despite the importance of extracting structured reactions from
scientific literature, data annotation for this purpose is cost-prohibitive due
to the significant labor required from domain experts. Consequently, the
scarcity of sufficient training data poses an obstacle to the progress of
related models in this domain. In this paper, we propose ReactIE, which
combines two weakly supervised approaches for pre-training. Our method utilizes
frequent patterns within the text as linguistic cues to identify specific
characteristics of chemical reactions. Additionally, we adopt synthetic data
from patent records as distant supervision to incorporate domain knowledge into
the model. Experiments demonstrate that ReactIE achieves substantial
improvements and outperforms all existing baselines.Comment: Findings of ACL 2023, Short Pape