Recently, the structural reading comprehension (SRC) task on web pages has
attracted increasing research interests. Although previous SRC work has
leveraged extra information such as HTML tags or XPaths, the informative
topology of web pages is not effectively exploited. In this work, we propose a
Topological Information Enhanced model (TIE), which transforms the token-level
task into a tag-level task by introducing a two-stage process (i.e. node
locating and answer refining). Based on that, TIE integrates Graph Attention
Network (GAT) and Pre-trained Language Model (PLM) to leverage the topological
information of both logical structures and spatial structures. Experimental
results demonstrate that our model outperforms strong baselines and achieves
state-of-the-art performances on the web-based SRC benchmark WebSRC at the time
of writing. The code of TIE will be publicly available at
https://github.com/X-LANCE/TIE.Comment: Accepted to NAACL 202