75 research outputs found
Dual Progressive Transformations for Weakly Supervised Semantic Segmentation
Weakly supervised semantic segmentation (WSSS), which aims to mine the object
regions by merely using class-level labels, is a challenging task in computer
vision. The current state-of-the-art CNN-based methods usually adopt
Class-Activation-Maps (CAMs) to highlight the potential areas of the object,
however, they may suffer from the part-activated issues. To this end, we try an
early attempt to explore the global feature attention mechanism of vision
transformer in WSSS task. However, since the transformer lacks the inductive
bias as in CNN models, it can not boost the performance directly and may yield
the over-activated problems. To tackle these drawbacks, we propose a
Convolutional Neural Networks Refined Transformer (CRT) to mine a globally
complete and locally accurate class activation maps in this paper. To validate
the effectiveness of our proposed method, extensive experiments are conducted
on PASCAL VOC 2012 and CUB-200-2011 datasets. Experimental evaluations show
that our proposed CRT achieves the new state-of-the-art performance on both the
weakly supervised semantic segmentation task the weakly supervised object
localization task, which outperform others by a large margin
DI-Net : Decomposed Implicit Garment Transfer Network for Digital Clothed 3D Human
3D virtual try-on enjoys many potential applications and hence has attracted
wide attention. However, it remains a challenging task that has not been
adequately solved. Existing 2D virtual try-on methods cannot be directly
extended to 3D since they lack the ability to perceive the depth of each pixel.
Besides, 3D virtual try-on approaches are mostly built on the fixed topological
structure and with heavy computation. To deal with these problems, we propose a
Decomposed Implicit garment transfer network (DI-Net), which can effortlessly
reconstruct a 3D human mesh with the newly try-on result and preserve the
texture from an arbitrary perspective. Specifically, DI-Net consists of two
modules: 1) A complementary warping module that warps the reference image to
have the same pose as the source image through dense correspondence learning
and sparse flow learning; 2) A geometry-aware decomposed transfer module that
decomposes the garment transfer into image layout based transfer and texture
based transfer, achieving surface and texture reconstruction by constructing
pixel-aligned implicit functions. Experimental results show the effectiveness
and superiority of our method in the 3D virtual try-on task, which can yield
more high-quality results over other existing methods
SARA: Controllable Makeup Transfer with Spatial Alignment and Region-Adaptive Normalization
Makeup transfer is a process of transferring the makeup style from a
reference image to the source images, while preserving the source images'
identities. This technique is highly desirable and finds many applications.
However, existing methods lack fine-level control of the makeup style, making
it challenging to achieve high-quality results when dealing with large spatial
misalignments. To address this problem, we propose a novel Spatial Alignment
and Region-Adaptive normalization method (SARA) in this paper. Our method
generates detailed makeup transfer results that can handle large spatial
misalignments and achieve part-specific and shade-controllable makeup transfer.
Specifically, SARA comprises three modules: Firstly, a spatial alignment module
that preserves the spatial context of makeup and provides a target semantic map
for guiding the shape-independent style codes. Secondly, a region-adaptive
normalization module that decouples shape and makeup style using per-region
encoding and normalization, which facilitates the elimination of spatial
misalignments. Lastly, a makeup fusion module blends identity features and
makeup style by injecting learned scale and bias parameters. Experimental
results show that our SARA method outperforms existing methods and achieves
state-of-the-art performance on two public datasets
Semantic-Constraint Matching Transformer for Weakly Supervised Object Localization
Weakly supervised object localization (WSOL) strives to learn to localize
objects with only image-level supervision. Due to the local receptive fields
generated by convolution operations, previous CNN-based methods suffer from
partial activation issues, concentrating on the object's discriminative part
instead of the entire entity scope. Benefiting from the capability of the
self-attention mechanism to acquire long-range feature dependencies, Vision
Transformer has been recently applied to alleviate the local activation
drawbacks. However, since the transformer lacks the inductive localization bias
that are inherent in CNNs, it may cause a divergent activation problem
resulting in an uncertain distinction between foreground and background. In
this work, we proposed a novel Semantic-Constraint Matching Network (SCMN) via
a transformer to converge on the divergent activation. Specifically, we first
propose a local patch shuffle strategy to construct the image pairs, disrupting
local patches while guaranteeing global consistency. The paired images that
contain the common object in spatial are then fed into the Siamese network
encoder. We further design a semantic-constraint matching module, which aims to
mine the co-object part by matching the coarse class activation maps (CAMs)
extracted from the pair images, thus implicitly guiding and calibrating the
transformer network to alleviate the divergent activation. Extensive
experimental results conducted on two challenging benchmarks, including
CUB-200-2011 and ILSVRC datasets show that our method can achieve the new
state-of-the-art performance and outperform the previous method by a large
margin
Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests
LeCaRDv2: A Large-Scale Chinese Legal Case Retrieval Dataset
As an important component of intelligent legal systems, legal case retrieval
plays a critical role in ensuring judicial justice and fairness. However, the
development of legal case retrieval technologies in the Chinese legal system is
restricted by three problems in existing datasets: limited data size, narrow
definitions of legal relevance, and naive candidate pooling strategies used in
data sampling. To alleviate these issues, we introduce LeCaRDv2, a large-scale
Legal Case Retrieval Dataset (version 2). It consists of 800 queries and 55,192
candidates extracted from 4.3 million criminal case documents. To the best of
our knowledge, LeCaRDv2 is one of the largest Chinese legal case retrieval
datasets, providing extensive coverage of criminal charges. Additionally, we
enrich the existing relevance criteria by considering three key aspects:
characterization, penalty, procedure. This comprehensive criteria enriches the
dataset and may provides a more holistic perspective. Furthermore, we propose a
two-level candidate set pooling strategy that effectively identify potential
candidates for each query case. It's important to note that all cases in the
dataset have been annotated by multiple legal experts specializing in criminal
law. Their expertise ensures the accuracy and reliability of the annotations.
We evaluate several state-of-the-art retrieval models at LeCaRDv2,
demonstrating that there is still significant room for improvement in legal
case retrieval. The details of LeCaRDv2 can be found at the anonymous website
https://github.com/anonymous1113243/LeCaRDv2
WMFormer++: Nested Transformer for Visible Watermark Removal via Implict Joint Learning
Watermarking serves as a widely adopted approach to safeguard media
copyright. In parallel, the research focus has extended to watermark removal
techniques, offering an adversarial means to enhance watermark robustness and
foster advancements in the watermarking field. Existing watermark removal
methods mainly rely on UNet with task-specific decoder branches--one for
watermark localization and the other for background image restoration. However,
watermark localization and background restoration are not isolated tasks;
precise watermark localization inherently implies regions necessitating
restoration, and the background restoration process contributes to more
accurate watermark localization. To holistically integrate information from
both branches, we introduce an implicit joint learning paradigm. This empowers
the network to autonomously navigate the flow of information between implicit
branches through a gate mechanism. Furthermore, we employ cross-channel
attention to facilitate local detail restoration and holistic structural
comprehension, while harnessing nested structures to integrate multi-scale
information. Extensive experiments are conducted on various challenging
benchmarks to validate the effectiveness of our proposed method. The results
demonstrate our approach's remarkable superiority, surpassing existing
state-of-the-art methods by a large margin
- …