12 research outputs found
Automated Search for Resource-Efficient Branched Multi-Task Networks
The multi-modal nature of many vision problems calls for neural network
architectures that can perform multiple tasks concurrently. Typically, such
architectures have been handcrafted in the literature. However, given the size
and complexity of the problem, this manual architecture exploration likely
exceeds human design abilities. In this paper, we propose a principled
approach, rooted in differentiable neural architecture search, to automatically
define branching (tree-like) structures in the encoding stage of a multi-task
neural network. To allow flexibility within resource-constrained environments,
we introduce a proxyless, resource-aware loss that dynamically controls the
model size. Evaluations across a variety of dense prediction tasks show that
our approach consistently finds high-performing branching structures within
limited resource budgets.Comment: British Machine Vision Conference (BMVC) 202
Prompt Guided Transformer for Multi-Task Dense Prediction
Task-conditional architecture offers advantage in parameter efficiency but
falls short in performance compared to state-of-the-art multi-decoder methods.
How to trade off performance and model parameters is an important and difficult
problem. In this paper, we introduce a simple and lightweight task-conditional
model called Prompt Guided Transformer (PGT) to optimize this challenge. Our
approach designs a Prompt-conditioned Transformer block, which incorporates
task-specific prompts in the self-attention mechanism to achieve global
dependency modeling and parameter-efficient feature adaptation across multiple
tasks. This block is integrated into both the shared encoder and decoder,
enhancing the capture of intra- and inter-task features. Moreover, we design a
lightweight decoder to further reduce parameter usage, which accounts for only
2.7% of the total model parameters. Extensive experiments on two multi-task
dense prediction benchmarks, PASCAL-Context and NYUD-v2, demonstrate that our
approach achieves state-of-the-art results among task-conditional methods while
using fewer parameters, and maintains a significant balance between performance
and parameter size.Comment: 10 page
Decomposition Ascribed Synergistic Learning for Unified Image Restoration
Learning to restore multiple image degradations within a single model is
quite beneficial for real-world applications. Nevertheless, existing works
typically concentrate on regarding each degradation independently, while their
relationship has been less exploited to ensure the synergistic learning. To
this end, we revisit the diverse degradations through the lens of singular
value decomposition, with the observation that the decomposed singular vectors
and singular values naturally undertake the different types of degradation
information, dividing various restoration tasks into two groups,\ie, singular
vector dominated and singular value dominated. The above analysis renders a
more unified perspective to ascribe the diverse degradations, compared to
previous task-level independent learning. The dedicated optimization of
degraded singular vectors and singular values inherently utilizes the potential
relationship among diverse restoration tasks, attributing to the Decomposition
Ascribed Synergistic Learning (DASL). Specifically, DASL comprises two
effective operators, namely, Singular VEctor Operator (SVEO) and Singular VAlue
Operator (SVAO), to favor the decomposed optimization, which can be lightly
integrated into existing convolutional image restoration backbone. Moreover,
the congruous decomposition loss has been devised for auxiliary. Extensive
experiments on blended five image restoration tasks demonstrate the
effectiveness of our method, including image deraining, image dehazing, image
denoising, image deblurring, and low-light image enhancement.Comment: 13 page
Learning to Relate Depth and Semantics for Unsupervised Domain Adaptation
We present an approach for encoding visual task relationships to improve
model performance in an Unsupervised Domain Adaptation (UDA) setting. Semantic
segmentation and monocular depth estimation are shown to be complementary
tasks; in a multi-task learning setting, a proper encoding of their
relationships can further improve performance on both tasks. Motivated by this
observation, we propose a novel Cross-Task Relation Layer (CTRL), which encodes
task dependencies between the semantic and depth predictions. To capture the
cross-task relationships, we propose a neural network architecture that
contains task-specific and cross-task refinement heads. Furthermore, we propose
an Iterative Self-Learning (ISL) training scheme, which exploits semantic
pseudo-labels to provide extra supervision on the target domain. We
experimentally observe improvements in both tasks' performance because the
complementary information present in these tasks is better captured.
Specifically, we show that: (1) our approach improves performance on all tasks
when they are complementary and mutually dependent; (2) the CTRL helps to
improve both semantic segmentation and depth estimation tasks performance in
the challenging UDA setting; (3) the proposed ISL training scheme further
improves the semantic segmentation performance. The implementation is available
at https://github.com/susaha/ctrl-uda.Comment: Accepted at CVPR 2021; updated results according to the released
source cod
Domain Expansion via Network Adaptation for Solving Inverse Problems
Deep learning-based methods deliver state-of-the-art performance for solving
inverse problems that arise in computational imaging. These methods can be
broadly divided into two groups: (1) learn a network to map measurements to the
signal estimate, which is known to be fragile; (2) learn a prior for the signal
to use in an optimization-based recovery. Despite the impressive results from
the latter approach, many of these methods also lack robustness to shifts in
data distribution, measurements, and noise levels. Such domain shifts result in
a performance gap and in some cases introduce undesired artifacts in the
estimated signal. In this paper, we explore the qualitative and quantitative
effects of various domain shifts and propose a flexible and parameter efficient
framework that adapt pretrained networks to such shifts. We demonstrate the
effectiveness of our method for a number of natural image, MRI, and CT
reconstructions tasks under domain, measurement model, and noise-level shifts.
Our experiments demonstrate that our method provides significantly better
performance and parameter efficiency compared to existing domain adaptation
techniques
A Survey on Continual Semantic Segmentation: Theory, Challenge, Method and Application
Continual learning, also known as incremental learning or life-long learning,
stands at the forefront of deep learning and AI systems. It breaks through the
obstacle of one-way training on close sets and enables continuous adaptive
learning on open-set conditions. In the recent decade, continual learning has
been explored and applied in multiple fields especially in computer vision
covering classification, detection and segmentation tasks. Continual semantic
segmentation (CSS), of which the dense prediction peculiarity makes it a
challenging, intricate and burgeoning task. In this paper, we present a review
of CSS, committing to building a comprehensive survey on problem formulations,
primary challenges, universal datasets, neoteric theories and multifarious
applications. Concretely, we begin by elucidating the problem definitions and
primary challenges. Based on an in-depth investigation of relevant approaches,
we sort out and categorize current CSS models into two main branches including
\textit{data-replay} and \textit{data-free} sets. In each branch, the
corresponding approaches are similarity-based clustered and thoroughly
analyzed, following qualitative comparison and quantitative reproductions on
relevant datasets. Besides, we also introduce four CSS specialities with
diverse application scenarios and development tendencies. Furthermore, we
develop a benchmark for CSS encompassing representative references, evaluation
results and reproductions, which is available
at~\url{https://github.com/YBIO/SurveyCSS}. We hope this survey can serve as a
reference-worthy and stimulating contribution to the advancement of the
life-long learning field, while also providing valuable perspectives for
related fields.Comment: 20 pages, 12 figures. Undergoing Revie
Visionary Ophthalmics: Confluence of Computer Vision and Deep Learning for Ophthalmology
Ophthalmology is a medical field ripe with opportunities for meaningful application of computer vision algorithms. The field utilizes data from multiple disparate imaging techniques, ranging from conventional cameras to tomography, comprising a diverse set of computer vision challenges. Computer vision has a rich history of techniques that can adequately meet many of these challenges. However, the field has undergone something of a revolution in recent times as deep learning techniques have sprung into the forefront following advances in GPU hardware. This development raises important questions regarding how to best leverage insights from both modern deep learning approaches and more classical computer vision approaches for a given problem. In this dissertation, we tackle challenging computer vision problems in ophthalmology using methods all across this spectrum. Perhaps our most significant work is a highly successful iris registration algorithm for use in laser eye surgery. This algorithm relies on matching features extracted from the structure tensor and a Gabor wavelet – a classically driven approach that does not utilize modern machine learning. However, drawing on insight from the deep learning revolution, we demonstrate successful application of backpropagation to optimize the registration significantly faster than the alternative of relying on finite differences. Towards the other end of the spectrum, we also present a novel framework for improving RANSAC segmentation algorithms by utilizing a convolutional neural network (CNN) trained on a RANSAC-based loss function. Finally, we apply state-of-the-art deep learning methods to solve the problem of pathological fluid detection in optical coherence tomography images of the human retina, using a novel retina-specific data augmentation technique to greatly expand the data set. Altogether, our work demonstrates benefits of applying a holistic view of computer vision, which leverages deep learning and associated insights without neglecting techniques and insights from the previous era
Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference
ISSN:0302-9743ISSN:1611-334