18,239 research outputs found
Detecting Visual Relationships with Deep Relational Networks
Relationships among objects play a crucial role in image understanding.
Despite the great success of deep learning techniques in recognizing individual
objects, reasoning about the relationships among objects remains a challenging
task. Previous methods often treat this as a classification problem,
considering each type of relationship (e.g. "ride") or each distinct visual
phrase (e.g. "person-ride-horse") as a category. Such approaches are faced with
significant difficulties caused by the high diversity of visual appearance for
each kind of relationships or the large number of distinct visual phrases. We
propose an integrated framework to tackle this problem. At the heart of this
framework is the Deep Relational Network, a novel formulation designed
specifically for exploiting the statistical dependencies between objects and
their relationships. On two large datasets, the proposed method achieves
substantial improvement over state-of-the-art.Comment: To be appeared in CVPR 2017 as an oral pape
Dynamic Graph Generation Network: Generating Relational Knowledge from Diagrams
In this work, we introduce a new algorithm for analyzing a diagram, which
contains visual and textual information in an abstract and integrated way.
Whereas diagrams contain richer information compared with individual
image-based or language-based data, proper solutions for automatically
understanding them have not been proposed due to their innate characteristics
of multi-modality and arbitrariness of layouts. To tackle this problem, we
propose a unified diagram-parsing network for generating knowledge from
diagrams based on an object detector and a recurrent neural network designed
for a graphical structure. Specifically, we propose a dynamic graph-generation
network that is based on dynamic memory and graph theory. We explore the
dynamics of information in a diagram with activation of gates in gated
recurrent unit (GRU) cells. On publicly available diagram datasets, our model
demonstrates a state-of-the-art result that outperforms other baselines.
Moreover, further experiments on question answering shows potentials of the
proposed method for various applications
- …