2,209 research outputs found

    Bidirectional Correlation-Driven Inter-Frame Interaction Transformer for Referring Video Object Segmentation

    Full text link
    Referring video object segmentation (RVOS) aims to segment the target object in a video sequence described by a language expression. Typical multimodal Transformer based RVOS approaches process video sequence in a frame-independent manner to reduce the high computational cost, which however restricts the performance due to the lack of inter-frame interaction for temporal coherence modeling and spatio-temporal representation learning of the referred object. Besides, the absence of sufficient cross-modal interactions results in weak correlation between the visual and linguistic features, which increases the difficulty of decoding the target information and limits the performance of the model. In this paper, we propose a bidirectional correlation-driven inter-frame interaction Transformer, dubbed BIFIT, to address these issues in RVOS. Specifically, we design a lightweight and plug-and-play inter-frame interaction module in the Transformer decoder to efficiently learn the spatio-temporal features of the referred object, so as to decode the object information in the video sequence more precisely and generate more accurate segmentation results. Moreover, a bidirectional vision-language interaction module is implemented before the multimodal Transformer to enhance the correlation between the visual and linguistic features, thus facilitating the language queries to decode more precise object information from visual features and ultimately improving the segmentation performance. Extensive experimental results on four benchmarks validate the superiority of our BIFIT over state-of-the-art methods and the effectiveness of our proposed modules

    2-Amino-4,6-dimethyl­pyrimidine–benzoic acid (1/1)

    Get PDF
    The crystal of the title compound, C6H9N3·C7H6O2, contains tetra­meric hydrogen-bonded units comprising a central pair of 2-amino­pyrimidine mol­ecules linked across a centre of inversion by N—H⋯N hydrogen bonds and two pendant benzoic acid mol­ecules attached through N—H⋯O and O—H⋯N hydrogen bonds. These hydrogen-bonded units are arranged into layers in (002)

    4-Methyl-6-phenyl­pyrimidin-2-amine

    Get PDF
    The title compound, C11H11N3, was synthesized as part of our research into functionalized pyrimidines. It crystallizes with two independent mol­ecules in the asymmetric unit that differ only in the twist between the two aromatic rings; the torsion angles between the rings are 29.9 (2) and 45.1 (2)°. The crystal packing is dominated by inter­molecular N—H⋯N hydrogen bonds between independent and equivalent mol­ecules, forming an infinite three-dimensional network

    Strategies for Searching Video Content with Text Queries or Video Examples

    Full text link
    The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search. However, metadata is often lacking for user-generated videos, thus these videos are unsearchable by current search engines. Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity problem by directly analyzing the visual and audio streams of each video. CBVR encompasses multiple research topics, including low-level feature design, feature fusion, semantic detector training and video search/reranking. We present novel strategies in these topics to enhance CBVR in both accuracy and speed under different query inputs, including pure textual queries and query by video examples. Our proposed strategies have been incorporated into our submission for the TRECVID 2014 Multimedia Event Detection evaluation, where our system outperformed other submissions in both text queries and video example queries, thus demonstrating the effectiveness of our proposed approaches

    Defense against Adversarial Cloud Attack on Remote Sensing Salient Object Detection

    Full text link
    Detecting the salient objects in a remote sensing image has wide applications for the interdisciplinary research. Many existing deep learning methods have been proposed for Salient Object Detection (SOD) in remote sensing images and get remarkable results. However, the recent adversarial attack examples, generated by changing a few pixel values on the original remote sensing image, could result in a collapse for the well-trained deep learning based SOD model. Different with existing methods adding perturbation to original images, we propose to jointly tune adversarial exposure and additive perturbation for attack and constrain image close to cloudy image as Adversarial Cloud. Cloud is natural and common in remote sensing images, however, camouflaging cloud based adversarial attack and defense for remote sensing images are not well studied before. Furthermore, we design DefenseNet as a learn-able pre-processing to the adversarial cloudy images so as to preserve the performance of the deep learning based remote sensing SOD model, without tuning the already deployed deep SOD model. By considering both regular and generalized adversarial examples, the proposed DefenseNet can defend the proposed Adversarial Cloud in white-box setting and other attack methods in black-box setting. Experimental results on a synthesized benchmark from the public remote sensing SOD dataset (EORSSD) show the promising defense against adversarial cloud attacks

    Rank Optimization for MIMO systems with RIS: Simulation and Measurement

    Full text link
    Reconfigurable intelligent surface (RIS) is a promising technology that can reshape the electromagnetic environment in wireless networks, offering various possibilities for enhancing wireless channels. Motivated by this, we investigate the channel optimization for multiple-input multiple-output (MIMO) systems assisted by RIS. In this paper, an efficient RIS optimization method is proposed to enhance the effective rank of the MIMO channel for achievable rate improvement. Numerical results are presented to verify the effectiveness of RIS in improving MIMO channels. Additionally, we construct a 2×\times2 RIS-assisted MIMO prototype to perform experimental measurements and validate the performance of our proposed algorithm. The results reveal a significant increase in effective rank and achievable rate for the RIS-assisted MIMO channel compared to the MIMO channel without RIS
    corecore