1,068 research outputs found
Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples
Referring video object segmentation (RVOS), as a supervised learning task,
relies on sufficient annotated data for a given scene. However, in more
realistic scenarios, only minimal annotations are available for a new scene,
which poses significant challenges to existing RVOS methods. With this in mind,
we propose a simple yet effective model with a newly designed cross-modal
affinity (CMA) module based on a Transformer architecture. The CMA module
builds multimodal affinity with a few samples, thus quickly learning new
semantic information, and enabling the model to adapt to different scenarios.
Since the proposed method targets limited samples for new scenes, we generalize
the problem as - few-shot referring video object segmentation (FS-RVOS). To
foster research in this direction, we build up a new FS-RVOS benchmark based on
currently available datasets. The benchmark covers a wide range and includes
multiple situations, which can maximally simulate real-world scenarios.
Extensive experiments show that our model adapts well to different scenarios
with only a few samples, reaching state-of-the-art performance on the
benchmark. On Mini-Ref-YouTube-VOS, our model achieves an average performance
of 53.1 J and 54.8 F, which are 10% better than the baselines. Furthermore, we
show impressive results of 77.7 J and 74.8 F on Mini-Ref-SAIL-VOS, which are
significantly better than the baselines. Code is publicly available at
https://github.com/hengliusky/Few_shot_RVOS.Comment: Accepted by ICCV202
Thick branes with a nonminimally coupled bulk-scalar field
In this paper, we investigate thick branes with a nonminimally coupled
background scalar field, whose solution is a single-kink or a double-kink. The
effects of the nonminimal coupling constant on the structure of the thick
branes and the localization of gravity, fermions, scalars and vectors are
discussed. It is shown that each brane will split into two sub-branes as
increasing the nonminimal coupling constant . By investigating the tensor
perturbation equations of gravity and the general covariant Dirac equation of
fermions, we find that both the gravity zero mode and left-chiral fermion zero
mode are localized at the center of the single-kink branes and localized
between the two sub-branes generated by the double-kink, which indicates that
the constant does not effect the localization of these zero modes.
However, the zero mode of scalars is localized on each sub-brane (for both
single-kink and double-kink branes) when is larger than its critical
value . The effects of the nonminimal coupling constant on the
resonances of gravity and fermions with finite lifetime on the branes are also
discussed.Comment: V2: 33 pages, 17 figures, 3 tables, published versio
Fermion Resonances on a Thick Brane with a Piecewise Warp Factor
In this paper, we mainly investigate the problems of resonances of massive KK
fermions on a single scalar constructed thick brane with a piecewise warp
factor matching smoothly. The distance between two boundaries and the other
parameters are determined by one free parameter through three junction
conditions. For the generalized Yukawa coupling
with odd , the mass eigenvalue , width , lifetime
, and maximal probability of fermion resonances are obtained.
Our numerical calculations show that the brane without internal structure also
favors the appearance of resonant states for both left- and right-handed
fermions. The scalar-fermion coupling and the thickness of the brane influence
the resonant behaviors of the massive KK fermions.Comment: V3: 15 pages, 7 figures, published versio
Towards Control-Centric Representations in Reinforcement Learning from Images
Image-based Reinforcement Learning is a practical yet challenging task. A
major hurdle lies in extracting control-centric representations while
disregarding irrelevant information. While approaches that follow the
bisimulation principle exhibit the potential in learning state representations
to address this issue, they still grapple with the limited expressive capacity
of latent dynamics and the inadaptability to sparse reward environments. To
address these limitations, we introduce ReBis, which aims to capture
control-centric information by integrating reward-free control information
alongside reward-specific knowledge. ReBis utilizes a transformer architecture
to implicitly model the dynamics and incorporates block-wise masking to
eliminate spatiotemporal redundancy. Moreover, ReBis combines
bisimulation-based loss with asymmetric reconstruction loss to prevent feature
collapse in environments with sparse rewards. Empirical studies on two large
benchmarks, including Atari games and DeepMind Control Suit, demonstrate that
ReBis has superior performance compared to existing methods, proving its
effectiveness
- …