14 research outputs found
TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts
Learning discriminative task-specific features simultaneously for multiple
distinct tasks is a fundamental problem in multi-task learning. Recent
state-of-the-art models consider directly decoding task-specific features from
one shared task-generic feature (e.g., feature from a backbone layer), and
utilize carefully designed decoders to produce multi-task features. However, as
the input feature is fully shared and each task decoder also shares decoding
parameters for different input samples, it leads to a static feature decoding
process, producing less discriminative task-specific representations. To tackle
this limitation, we propose TaskExpert, a novel multi-task mixture-of-experts
model that enables learning multiple representative task-generic feature spaces
and decoding task-specific features in a dynamic manner. Specifically,
TaskExpert introduces a set of expert networks to decompose the backbone
feature into several representative task-generic features. Then, the
task-specific features are decoded by using dynamic task-specific gating
networks operating on the decomposed task-generic features. Furthermore, to
establish long-range modeling of the task-specific representations from
different layers of TaskExpert, we design a multi-task feature memory that
updates at each layer and acts as an additional feature expert for dynamic
task-specific feature decoding. Extensive experiments demonstrate that our
TaskExpert clearly outperforms previous best-performing methods on all 9
metrics of two competitive multi-task learning benchmarks for visual scene
understanding (i.e., PASCAL-Context and NYUD-v2). Codes and models will be made
publicly available at https://github.com/prismformore/Multi-Task-TransformerComment: Accepted by ICCV 202
Video Logo Retrieval based on local Features
Estimation of the frequency and duration of logos in videos is important and
challenging in the advertisement industry as a way of estimating the impact of
ad purchases. Since logos occupy only a small area in the videos, the popular
methods of image retrieval could fail. This paper develops an algorithm called
Video Logo Retrieval (VLR), which is an image-to-video retrieval algorithm
based on the spatial distribution of local image descriptors that measure the
distance between the query image (the logo) and a collection of video images.
VLR uses local features to overcome the weakness of global feature-based models
such as convolutional neural networks (CNN). Meanwhile, VLR is flexible and
does not require training after setting some hyper-parameters. The performance
of VLR is evaluated on two challenging open benchmark tasks (SoccerNet and
Standford I2V), and compared with other state-of-the-art logo retrieval or
detection algorithms. Overall, VLR shows significantly higher accuracy compared
with the existing methods.Comment: Accepted by ICIP 20. Contact author: Bochen Guan ([email protected]
Self-Refining Deep Symmetry Enhanced Network for Rain Removal
Rain removal aims to remove the rain streaks on rain images. The
state-of-the-art methods are mostly based on Convolutional Neural
Network~(CNN). However, as CNN is not equivariant to object rotation, these
methods are unsuitable for dealing with the tilted rain streaks. To tackle this
problem, we propose Deep Symmetry Enhanced Network~(DSEN) that is able to
explicitly extract the rotation equivariant features from rain images. In
addition, we design a self-refining mechanism to remove the accumulated rain
streaks in a coarse-to-fine manner. This mechanism reuses DSEN with a novel
information link which passes the gradient flow to the higher stages. Extensive
experiments on both synthetic and real-world rain images show that our
self-refining DSEN yields the top performance.Comment: Accepted by ICIP 19. Corresponding and contact author: Hanrong Y
Self-refining deep symmetry enhanced network for rain removal
Rain removal aims to remove the rain streaks on rain images. The
state-of-the-art methods are mostly based on Convolutional Neural
Network~(CNN). However, as CNN is not equivariant to object rotation, these
methods are unsuitable for dealing with the tilted rain streaks. To tackle this
problem, we propose Deep Symmetry Enhanced Network~(DSEN) that is able to
explicitly extract the rotation equivariant features from rain images. In
addition, we design a self-refining mechanism to remove the accumulated rain
streaks in a coarse-to-fine manner. This mechanism reuses DSEN with a novel
information link which passes the gradient flow to the higher stages. Extensive
experiments on both synthetic and real-world rain images show that our
self-refining DSEN yields the top performance.Comment: Accepted by ICIP 19. Corresponding author: Hanrong Y
Robust estimation of bacterial cell count from optical density
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data
Inverted Pyramid Multi-task Transformer for Dense Scene Understanding
Multi-task dense scene understanding is a thriving research domain that
requires simultaneous perception and reasoning on a series of correlated tasks
with pixel-wise prediction. Most existing works encounter a severe limitation
of modeling in the locality due to heavy utilization of convolution operations,
while learning interactions and inference in a global spatial-position and
multi-task context is critical for this problem. In this paper, we propose a
novel end-to-end Inverted Pyramid multi-task Transformer (InvPT) to perform
simultaneous modeling of spatial positions and multiple tasks in a unified
framework. To the best of our knowledge, this is the first work that explores
designing a transformer structure for multi-task dense prediction for scene
understanding. Besides, it is widely demonstrated that a higher spatial
resolution is remarkably beneficial for dense predictions, while it is very
challenging for existing transformers to go deeper with higher resolutions due
to huge complexity to large spatial size. InvPT presents an efficient
UP-Transformer block to learn multi-task feature interaction at gradually
increased resolutions, which also incorporates effective self-attention message
passing and multi-scale feature aggregation to produce task-specific prediction
at a high resolution. Our method achieves superior multi-task performance on
NYUD-v2 and PASCAL-Context datasets respectively, and significantly outperforms
previous state-of-the-arts. The code is available at
https://github.com/prismformore/InvPTComment: To appear in ECCV 2022 Conference. Code is available at
https://github.com/prismformore/InvP
Contrastive Multi-Task Dense Prediction
This paper targets the problem of multi-task dense prediction
which aims to achieve simultaneous learning and inference on
a bunch of multiple dense prediction tasks in a single framework. A core objective in design is how to effectively model
cross-task interactions to achieve a comprehensive improvement on different tasks based on their inherent complementarity and consistency. Existing works typically design extra
expensive distillation modules to perform explicit interaction
computations among different task-specific features in both
training and inference, bringing difficulty in adaptation for
different task sets, and reducing efficiency due to clearly increased size of multi-task models. In contrast, we introduce
feature-wise contrastive consistency into modeling the cross-task interactions for multi-task dense prediction. We propose
a novel multi-task contrastive regularization method based on
the consistency to effectively boost the representation learning of the different sub-tasks, which can also be easily generalized to different multi-task dense prediction frameworks,
and costs no additional computation in the inference. Extensive experiments on two challenging datasets (i.e. NYUD-v2
and Pascal-Context) clearly demonstrate the superiority of the
proposed multi-task contrastive learning approach for dense
predictions, establishing new state-of-the-art performances
Self-Refining Deep Symmetry Enhanced Network for Rain Removal
Rain removal aims to remove the rain streaks on rain images. The state-of-the-art methods are mostly based on Convolutional Neural Network~(CNN). However, as CNN is not equivariant to object rotation, these methods are unsuitable for dealing with the tilted rain streaks. To tackle this problem, we propose Deep Symmetry Enhanced Network~(DSEN) that is able to explicitly extract the rotation equivariant features from rain images. In addition, we design a self-refining mechanism to remove the accumulated rain streaks in a coarse-to-fine manner. This mechanism reuses DSEN with a novel information link which passes the gradient flow to the higher stages. Extensive experiments on both synthetic and real-world rain images show that our self-refining DSEN yields the top performance.
Comment: Accepted by ICIP 19. Corresponding author: Hanrong Y