46,644 research outputs found
Detecting Visual Relationships with Deep Relational Networks
Relationships among objects play a crucial role in image understanding.
Despite the great success of deep learning techniques in recognizing individual
objects, reasoning about the relationships among objects remains a challenging
task. Previous methods often treat this as a classification problem,
considering each type of relationship (e.g. "ride") or each distinct visual
phrase (e.g. "person-ride-horse") as a category. Such approaches are faced with
significant difficulties caused by the high diversity of visual appearance for
each kind of relationships or the large number of distinct visual phrases. We
propose an integrated framework to tackle this problem. At the heart of this
framework is the Deep Relational Network, a novel formulation designed
specifically for exploiting the statistical dependencies between objects and
their relationships. On two large datasets, the proposed method achieves
substantial improvement over state-of-the-art.Comment: To be appeared in CVPR 2017 as an oral pape
Do Deep Neural Networks Suffer from Crowding?
Crowding is a visual effect suffered by humans, in which an object that can
be recognized in isolation can no longer be recognized when other objects,
called flankers, are placed close to it. In this work, we study the effect of
crowding in artificial Deep Neural Networks for object recognition. We analyze
both standard deep convolutional neural networks (DCNNs) as well as a new
version of DCNNs which is 1) multi-scale and 2) with size of the convolution
filters change depending on the eccentricity wrt to the center of fixation.
Such networks, that we call eccentricity-dependent, are a computational model
of the feedforward path of the primate visual cortex. Our results reveal that
the eccentricity-dependent model, trained on target objects in isolation, can
recognize such targets in the presence of flankers, if the targets are near the
center of the image, whereas DCNNs cannot. Also, for all tested networks, when
trained on targets in isolation, we find that recognition accuracy of the
networks decreases the closer the flankers are to the target and the more
flankers there are. We find that visual similarity between the target and
flankers also plays a role and that pooling in early layers of the network
leads to more crowding. Additionally, we show that incorporating the flankers
into the images of the training set does not improve performance with crowding.Comment: CBMM mem
- …