762 research outputs found

    Gongsun Longzi’s “form”: Minimal word meaning

    Get PDF
    Inspired by Gongsun Longzi’s “form-naming” idea about word meaning, this paper argues that 1) the internal lexicon contains only the list of word-meaning pairs, with no additional information either as part of word meaning or as a structural level above it; 2) the meaning of word is a minimal C-Form, the identifying conceptual meaning that individuates a concept; 3) C-Form is the interface between word meaning and concept meaning; and 4) a sentence has a minimal semantic content, consisting of the minimal meanings of the words composing it, which is propositional and truth-evaluable, and contextual elements contribute nothing to the meaning of language expressions. This paper adheres to semantic minimalism, believing meanwhile that meaning holism helps in semantics inquiry, since reflection on language meaning differs from language meaning itself. 

    Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding

    Full text link
    Multimodal transformer exhibits high capacity and flexibility to align image and text for visual grounding. However, the existing encoder-only grounding framework (e.g., TransVG) suffers from heavy computation due to the self-attention operation with quadratic time complexity. To address this issue, we present a new multimodal transformer architecture, coined as Dynamic Mutilmodal DETR (Dynamic MDETR), by decoupling the whole grounding process into encoding and decoding phases. The key observation is that there exists high spatial redundancy in images. Thus, we devise a new dynamic multimodal transformer decoder by exploiting this sparsity prior to speed up the visual grounding process. Specifically, our dynamic decoder is composed of a 2D adaptive sampling module and a text guided decoding module. The sampling module aims to select these informative patches by predicting the offsets with respect to a reference point, while the decoding module works for extracting the grounded object information by performing cross attention between image features and text features. These two modules are stacked alternatively to gradually bridge the modality gap and iteratively refine the reference point of grounded object, eventually realizing the objective of visual grounding. Extensive experiments on five benchmarks demonstrate that our proposed Dynamic MDETR achieves competitive trade-offs between computation and accuracy. Notably, using only 9% feature points in the decoder, we can reduce ~44% GFLOPs of the multimodal transformer, but still get higher accuracy than the encoder-only counterpart. In addition, to verify its generalization ability and scale up our Dynamic MDETR, we build the first one-stage CLIP empowered visual grounding framework, and achieve the state-of-the-art performance on these benchmarks.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) in October 202

    A Supramolecular Strategy to Assemble Multifunctional Viral Nanoparticles

    Get PDF
    Using a one-pot approach driven by the supramolecular interaction between β-cyclodextrin and adamantyl moieties, multifunctional viral nanoparticles can be facilely formulated for biomedical applications
    • …
    corecore