39,380 research outputs found
Visual Confusion Label Tree For Image Classification
Convolution neural network models are widely used in image classification
tasks. However, the running time of such models is so long that it is not the
conforming to the strict real-time requirement of mobile devices. In order to
optimize models and meet the requirement mentioned above, we propose a method
that replaces the fully-connected layers of convolution neural network models
with a tree classifier. Specifically, we construct a Visual Confusion Label
Tree based on the output of the convolution neural network models, and use a
multi-kernel SVM plus classifier with hierarchical constraints to train the
tree classifier. Focusing on those confusion subsets instead of the entire set
of categories makes the tree classifier more discriminative and the replacement
of the fully-connected layers reduces the original running time. Experiments
show that our tree classifier obtains a significant improvement over the
state-of-the-art tree classifier by 4.3% and 2.4% in terms of top-1 accuracy on
CIFAR-100 and ImageNet datasets respectively. Additionally, our method achieves
124x and 115x speedup ratio compared with fully-connected layers on AlexNet and
VGG16 without accuracy decline.Comment: 9 pages, 5 figures, conferenc
Human-Machine CRFs for Identifying Bottlenecks in Holistic Scene Understanding
Recent trends in image understanding have pushed for holistic scene
understanding models that jointly reason about various tasks such as object
detection, scene recognition, shape analysis, contextual reasoning, and local
appearance based classifiers. In this work, we are interested in understanding
the roles of these different tasks in improved scene understanding, in
particular semantic segmentation, object detection and scene recognition.
Towards this goal, we "plug-in" human subjects for each of the various
components in a state-of-the-art conditional random field model. Comparisons
among various hybrid human-machine CRFs give us indications of how much "head
room" there is to improve scene understanding by focusing research efforts on
various individual tasks
- …