2,003 research outputs found

    Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?

    Full text link
    The multimedia community has shown a significant interest in perceiving and representing the physical world with multimodal pretrained neural network models, and among them, the visual-language pertaining (VLP) is, currently, the most captivating topic. However, there have been few endeavors dedicated to the exploration of 1) whether essential linguistic knowledge (e.g., semantics and syntax) can be extracted during VLP, and 2) how such linguistic knowledge impact or enhance the multimodal alignment. In response, here we aim to elucidate the impact of comprehensive linguistic knowledge, including semantic expression and syntactic structure, on multimodal alignment. Specifically, we design and release the SNARE, the first large-scale multimodal alignment probing benchmark, to detect the vital linguistic components, e.g., lexical, semantic, and syntax knowledge, containing four tasks: Semantic structure, Negation logic, Attribute ownership, and Relationship composition. Based on our proposed probing benchmarks, our holistic analyses of five advanced VLP models illustrate that the VLP model: i) shows insensitivity towards complex syntax structures and relies on content words for sentence comprehension; ii) demonstrates limited comprehension of combinations between sentences and negations; iii) faces challenges in determining the presence of actions or spatial relationships within visual information and struggles with verifying the correctness of triple combinations. We make our benchmark and code available at \url{https://github.com/WangFei-2019/SNARE/}.Comment: [TL;DR] we design and release the SNARE, the first large-scale multimodal alignment probing benchmark for current vision-language pretrained model

    Rasiowa–Sikorski deduction systems in computer science applications

    Get PDF
    AbstractA Rasiowa-Sikorski system is a sequence-type formalization of logics. The system uses invertible decomposition rules which decompose a formula into sequences of simpler formulae whose validity is equivalent to validity of the original formula. There may also be expansion rules which close indecomposable sequences under certain properties of relations appearing in the formulae, like symmetry or transitivity. Proofs are finite decomposition trees with leaves having “fundamental”, valid labels. The author describes a general method of applying the R-S formalism to develop complete deduction systems for various brands of C.S and A.I. logic, including a logic for reasoning about relative similarity, a three-valued software specification logic with McCarthy's connectives and Kleene quantifiers, a logic for nondeterministic specifications, many-sorted FOL with possibly empty carriers of some sorts, and a three-valued logic for reasoning about concurrency

    The discourse structure of video games:A multimodal discourse semantics approach to game tutorials

    Get PDF
    The article proposes a multimodal discourse semantics approach to the analysis of video game tutorials that provides a discourse pragmatic analysis of the game canvases in these tutorials. The study mainly builds on linguistic approaches to formal dynamic discourse semantics that have already been successfully applied to other multimodal artefacts. The article will showcase the application of the resulting ‘logic of multimodal discourse interpretation’ to two specific cases of video game tutorials. This will outline particular discourse relations holding between events and segments in the tutorials as distinctive features of this video game genre and show the discursive patterns of these instructions

    Epic Stories: Sequence Fiction, Young Readers, And The Aesthetics Of World Building

    Get PDF
    This study theorizes the world building processes that sequence fiction engages within a framework of intratextual structuralism and cognitive aesthetic stage theory. The study begins with an interdisciplinary overview of fictional and possible worlds theory before proposing a structural adaptation of this lens that explains the developmental, aesthetic benefits of the genre for young readers. Chapter II is an application of the adapted lens to a canonical epic, the His Dark Materials sequence by Philip Pullman. I interpret the intentional structure of the story world across novels to discuss how these engage readers at different aesthetic milestones and encourage a deeper imaginative construct as a result. Chapter III is a similar application of the proposed theory for the popular television story world: Nickelodeon’s animated epic, The Last Airbender by Michael DiMartino and Bryan Konietzko. The examination of this story world includes a discussion of how media and different forms of literacy disrupt and encourage specific aesthetic responses to a story world. The final chapter begins with an observational discussion of my two children and their experiences engaging with fictional worlds. My analysis of their responses to a popular sequence proposes the children have an intuitive reading process that revolves around play and multimodal engagement with fiction that enhances the internalization of a story world. The chapter concludes with a discussion of how similar methods in an adult classroom can benefit adult students that struggle with reading engagement

    Intelligent Feature Extraction, Data Fusion and Detection of Concrete Bridge Cracks: Current Development and Challenges

    Full text link
    As a common appearance defect of concrete bridges, cracks are important indices for bridge structure health assessment. Although there has been much research on crack identification, research on the evolution mechanism of bridge cracks is still far from practical applications. In this paper, the state-of-the-art research on intelligent theories and methodologies for intelligent feature extraction, data fusion and crack detection based on data-driven approaches is comprehensively reviewed. The research is discussed from three aspects: the feature extraction level of the multimodal parameters of bridge cracks, the description level and the diagnosis level of the bridge crack damage states. We focus on previous research concerning the quantitative characterization problems of multimodal parameters of bridge cracks and their implementation in crack identification, while highlighting some of their major drawbacks. In addition, the current challenges and potential future research directions are discussed.Comment: Published at Intelligence & Robotics; Its copyright belongs to author
    • 

    corecore