414 research outputs found
Deep Image Harmonization
Compositing is one of the most common operations in photo editing. To
generate realistic composites, the appearances of foreground and background
need to be adjusted to make them compatible. Previous approaches to harmonize
composites have focused on learning statistical relationships between
hand-crafted appearance features of the foreground and background, which is
unreliable especially when the contents in the two layers are vastly different.
In this work, we propose an end-to-end deep convolutional neural network for
image harmonization, which can capture both the context and semantic
information of the composite images during harmonization. We also introduce an
efficient way to collect large-scale and high-quality training data that can
facilitate the training process. Experiments on the synthesized dataset and
real composite images show that the proposed network outperforms previous
state-of-the-art methods
Accurate and lightweight dehazing via multi-receptive-field non-local network and novel contrastive regularization
Recently, deep learning-based methods have dominated image dehazing domain.
Although very competitive dehazing performance has been achieved with
sophisticated models, effective solutions for extracting useful features are
still under-explored. In addition, non-local network, which has made a
breakthrough in many vision tasks, has not been appropriately applied to image
dehazing. Thus, a multi-receptive-field non-local network (MRFNLN) consisting
of the multi-stream feature attention block (MSFAB) and cross non-local block
(CNLB) is presented in this paper. We start with extracting richer features for
dehazing. Specifically, we design a multi-stream feature extraction (MSFE)
sub-block, which contains three parallel convolutions with different receptive
fields (i.e., , , ) for extracting multi-scale
features. Following MSFE, we employ an attention sub-block to make the model
adaptively focus on important channels/regions. The MSFE and attention
sub-blocks constitute our MSFAB. Then, we design a cross non-local block
(CNLB), which can capture long-range dependencies beyond the query. Instead of
the same input source of query branch, the key and value branches are enhanced
by fusing more preceding features. CNLB is computation-friendly by leveraging a
spatial pyramid down-sampling (SPDS) strategy to reduce the computation and
memory consumption without sacrificing the performance. Last but not least, a
novel detail-focused contrastive regularization (DFCR) is presented by
emphasizing the low-level details and ignoring the high-level semantic
information in the representation space. Comprehensive experimental results
demonstrate that the proposed MRFNLN model outperforms recent state-of-the-art
dehazing methods with less than 1.5 Million parameters.Comment: submitted to IEEE TCYB for possible publicatio
Broadcasting Quantum Fisher Information
It is well known that classical information can be cloned, but non-orthogonal
quantum states cannot be cloned, and non-commuting quantum states cannot be
broadcast. We conceive a scenario in which the object we want to broadcast is
the statistical distinguishability, as quantified by quantum Fisher
information, about a signal parameter encoded in quantum states. We show that
quantum Fisher information cannot be cloned, whilst it might be broadcast even
when the input states are non-commuting. This situation interpolates between
cloning of classical information and no-broadcasting of quantum information,
and indicates a hybrid way of information broadcasting which is of particular
significance from both practical and theoretical perspectives.Comment: 5 pages. Improved version. Any comments is welcom
Journal of Information Hiding and Multimedia Signal Processing c ⃝2014 ISSN 2073-4212 Ubiquitous International Volume
AIMS: All-Inclusive Multi-Level Segmentation
Despite the progress of image segmentation for accurate visual entity
segmentation, completing the diverse requirements of image editing applications
for different-level region-of-interest selections remains unsolved. In this
paper, we propose a new task, All-Inclusive Multi-Level Segmentation (AIMS),
which segments visual regions into three levels: part, entity, and relation
(two entities with some semantic relationships). We also build a unified AIMS
model through multi-dataset multi-task training to address the two major
challenges of annotation inconsistency and task correlation. Specifically, we
propose task complementarity, association, and prompt mask encoder for
three-level predictions. Extensive experiments demonstrate the effectiveness
and generalization capacity of our method compared to other state-of-the-art
methods on a single dataset or the concurrent work on segmenting anything. We
will make our code and training model publicly available.Comment: Technical Repor
High-Quality Entity Segmentation
Dense image segmentation tasks e.g., semantic, panoptic) are useful for image
editing, but existing methods can hardly generalize well in an in-the-wild
setting where there are unrestricted image domains, classes, and image
resolution and quality variations. Motivated by these observations, we
construct a new entity segmentation dataset, with a strong focus on
high-quality dense segmentation in the wild. The dataset contains images
spanning diverse image domains and entities, along with plentiful
high-resolution images and high-quality mask annotations for training and
testing. Given the high-quality and -resolution nature of the dataset, we
propose CropFormer which is designed to tackle the intractability of
instance-level segmentation on high-resolution images. It improves mask
prediction by fusing high-res image crops that provide more fine-grained image
details and the full image. CropFormer is the first query-based Transformer
architecture that can effectively fuse mask predictions from multiple image
views, by learning queries that effectively associate the same entities across
the full image and its crop. With CropFormer, we achieve a significant AP gain
of on the challenging entity segmentation task. Furthermore, CropFormer
consistently improves the accuracy of traditional segmentation tasks and
datasets. The dataset and code will be released at
http://luqi.info/entityv2.github.io/.Comment: The project webiste: http://luqi.info/entityv2.github.io
CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation
To improve instance-level detection/segmentation performance, existing
self-supervised and semi-supervised methods extract either task-unrelated or
task-specific training signals from unlabeled data. We show that these two
approaches, at the two extreme ends of the task-specificity spectrum, are
suboptimal for the task performance. Utilizing too little task-specific
training signals causes underfitting to the ground-truth labels of downstream
tasks, while the opposite causes overfitting to the ground-truth labels. To
this end, we propose a novel Class-Agnostic Semi-Supervised Learning (CA-SSL)
framework to achieve a more favorable task-specificity balance in extracting
training signals from unlabeled data. CA-SSL has three training stages that act
on either ground-truth labels (labeled data) or pseudo labels (unlabeled data).
This decoupling strategy avoids the complicated scheme in traditional SSL
methods that balances the contributions from both data types. Especially, we
introduce a warmup training stage to achieve a more optimal balance in task
specificity by ignoring class information in the pseudo labels, while
preserving localization training signals. As a result, our warmup model can
better avoid underfitting/overfitting when fine-tuned on the ground-truth
labels in detection and segmentation tasks. Using 3.6M unlabeled data, we
achieve a significant performance gain of 4.7% over ImageNet-pretrained
baseline on FCOS object detection. In addition, our warmup model demonstrates
excellent transferability to other detection and segmentation frameworks.Comment: Appeared in ECCV202
- …