In this work, we introduce a new algorithm for analyzing a diagram, which
contains visual and textual information in an abstract and integrated way.
Whereas diagrams contain richer information compared with individual
image-based or language-based data, proper solutions for automatically
understanding them have not been proposed due to their innate characteristics
of multi-modality and arbitrariness of layouts. To tackle this problem, we
propose a unified diagram-parsing network for generating knowledge from
diagrams based on an object detector and a recurrent neural network designed
for a graphical structure. Specifically, we propose a dynamic graph-generation
network that is based on dynamic memory and graph theory. We explore the
dynamics of information in a diagram with activation of gates in gated
recurrent unit (GRU) cells. On publicly available diagram datasets, our model
demonstrates a state-of-the-art result that outperforms other baselines.
Moreover, further experiments on question answering shows potentials of the
proposed method for various applications