1,068 research outputs found

    Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples

    Full text link
    Referring video object segmentation (RVOS), as a supervised learning task, relies on sufficient annotated data for a given scene. However, in more realistic scenarios, only minimal annotations are available for a new scene, which poses significant challenges to existing RVOS methods. With this in mind, we propose a simple yet effective model with a newly designed cross-modal affinity (CMA) module based on a Transformer architecture. The CMA module builds multimodal affinity with a few samples, thus quickly learning new semantic information, and enabling the model to adapt to different scenarios. Since the proposed method targets limited samples for new scenes, we generalize the problem as - few-shot referring video object segmentation (FS-RVOS). To foster research in this direction, we build up a new FS-RVOS benchmark based on currently available datasets. The benchmark covers a wide range and includes multiple situations, which can maximally simulate real-world scenarios. Extensive experiments show that our model adapts well to different scenarios with only a few samples, reaching state-of-the-art performance on the benchmark. On Mini-Ref-YouTube-VOS, our model achieves an average performance of 53.1 J and 54.8 F, which are 10% better than the baselines. Furthermore, we show impressive results of 77.7 J and 74.8 F on Mini-Ref-SAIL-VOS, which are significantly better than the baselines. Code is publicly available at https://github.com/hengliusky/Few_shot_RVOS.Comment: Accepted by ICCV202

    Thick branes with a nonminimally coupled bulk-scalar field

    Full text link
    In this paper, we investigate thick branes with a nonminimally coupled background scalar field, whose solution is a single-kink or a double-kink. The effects of the nonminimal coupling constant ξ\xi on the structure of the thick branes and the localization of gravity, fermions, scalars and vectors are discussed. It is shown that each brane will split into two sub-branes as increasing the nonminimal coupling constant ξ\xi. By investigating the tensor perturbation equations of gravity and the general covariant Dirac equation of fermions, we find that both the gravity zero mode and left-chiral fermion zero mode are localized at the center of the single-kink branes and localized between the two sub-branes generated by the double-kink, which indicates that the constant ξ\xi does not effect the localization of these zero modes. However, the zero mode of scalars is localized on each sub-brane (for both single-kink and double-kink branes) when ξ\xi is larger than its critical value ξ0\xi_0. The effects of the nonminimal coupling constant ξ\xi on the resonances of gravity and fermions with finite lifetime on the branes are also discussed.Comment: V2: 33 pages, 17 figures, 3 tables, published versio

    Fermion Resonances on a Thick Brane with a Piecewise Warp Factor

    Full text link
    In this paper, we mainly investigate the problems of resonances of massive KK fermions on a single scalar constructed thick brane with a piecewise warp factor matching smoothly. The distance between two boundaries and the other parameters are determined by one free parameter through three junction conditions. For the generalized Yukawa coupling ηΨˉϕkΨ\eta\bar{\Psi}\phi^{k}\Psi with odd k=1,3,5,...k=1,3,5,..., the mass eigenvalue mm, width Γ\Gamma, lifetime τ\tau, and maximal probability PmaxP_{max} of fermion resonances are obtained. Our numerical calculations show that the brane without internal structure also favors the appearance of resonant states for both left- and right-handed fermions. The scalar-fermion coupling and the thickness of the brane influence the resonant behaviors of the massive KK fermions.Comment: V3: 15 pages, 7 figures, published versio

    Towards Control-Centric Representations in Reinforcement Learning from Images

    Full text link
    Image-based Reinforcement Learning is a practical yet challenging task. A major hurdle lies in extracting control-centric representations while disregarding irrelevant information. While approaches that follow the bisimulation principle exhibit the potential in learning state representations to address this issue, they still grapple with the limited expressive capacity of latent dynamics and the inadaptability to sparse reward environments. To address these limitations, we introduce ReBis, which aims to capture control-centric information by integrating reward-free control information alongside reward-specific knowledge. ReBis utilizes a transformer architecture to implicitly model the dynamics and incorporates block-wise masking to eliminate spatiotemporal redundancy. Moreover, ReBis combines bisimulation-based loss with asymmetric reconstruction loss to prevent feature collapse in environments with sparse rewards. Empirical studies on two large benchmarks, including Atari games and DeepMind Control Suit, demonstrate that ReBis has superior performance compared to existing methods, proving its effectiveness
    • …
    corecore