169 research outputs found
Human-Machine CRFs for Identifying Bottlenecks in Holistic Scene Understanding
Recent trends in image understanding have pushed for holistic scene
understanding models that jointly reason about various tasks such as object
detection, scene recognition, shape analysis, contextual reasoning, and local
appearance based classifiers. In this work, we are interested in understanding
the roles of these different tasks in improved scene understanding, in
particular semantic segmentation, object detection and scene recognition.
Towards this goal, we "plug-in" human subjects for each of the various
components in a state-of-the-art conditional random field model. Comparisons
among various hybrid human-machine CRFs give us indications of how much "head
room" there is to improve scene understanding by focusing research efforts on
various individual tasks
Recommended from our members
Deep Structured Multi-Task Learning for Computer Vision in Autonomous Driving
The field of computer vision is currently dominated by deep learning advances. Convolutional
Neural Networks (CNNs) have become the predominant tool for solving almost any computer
vision task, so state-of-the-art systems have been built by using the predictive capabilities of
Convolutional Neural Networks (CNNs). Many of those systems use simple encoder–decoder
based design, where an off-the-shelf CNN architecture is combined with a task-specific
decoder and loss function in order to create an end-to-end trainable model. This ultimately
raises the question of whether these kinds of models are the future of computer vision.
In this thesis we argue that this is not the case. We start off by discussing three limitations
of simple end-to-end training. We proceed by showing how it is possible to overcome those
limitations by using an approach that we call structured modelling. The idea is to use CNNs
to compute a rich semantic intermediate representation which is then used to solve the actual
problem by applying a geometric and task-related structure.
In this work we solve the localization, segmentation and landmark recognition task
using structured modelling, and we show that this approach can improve generalization,
interpretability and robustness. We also discuss how this approach is particularly useful
for real-time applications such as autonomous driving. Visual perception is a multi-module
problem that requires several different computer vision tasks to be solved. We discuss how,
by sharing computations, we can improve not only the inference speed but also the prediction
performance by using the structural relationship between the tasks. Lastly, we demonstrate
that structured modelling is able to achieve state-of-the-art performance, making it a very
relevant approach for solving current and future computer vision problems.Trinity College, ESPCR, Qualcom
Efficient Hybrid Transformer: Learning Global-local Context for Urban Scene Segmentation
Semantic segmentation of fine-resolution urban scene images plays a vital
role in extensive practical applications, such as land cover mapping, urban
change detection, environmental protection and economic assessment. Driven by
rapid developments in deep learning technologies, the convolutional neural
network (CNN) has dominated the semantic segmentation task for many years.
Convolutional neural networks adopt hierarchical feature representation,
demonstrating strong local information extraction. However, the local property
of the convolution layer limits the network from capturing global context that
is crucial for precise segmentation. Recently, Transformer comprise a hot topic
in the computer vision domain. Transformer demonstrates the great capability of
global information modelling, boosting many vision tasks, such as image
classification, object detection and especially semantic segmentation. In this
paper, we propose an efficient hybrid Transformer (EHT) for real-time urban
scene segmentation. The EHT adopts a hybrid structure with and CNN-based
encoder and a transformer-based decoder, learning global-local context with
lower computation. Extensive experiments demonstrate that our EHT has faster
inference speed with competitive accuracy compared with state-of-the-art
lightweight models. Specifically, the proposed EHT achieves a 66.9% mIoU on the
UAVid test set and outperforms other benchmark networks significantly. The code
will be available soon
- …