2,003 research outputs found
Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?
The multimedia community has shown a significant interest in perceiving and
representing the physical world with multimodal pretrained neural network
models, and among them, the visual-language pertaining (VLP) is, currently, the
most captivating topic. However, there have been few endeavors dedicated to the
exploration of 1) whether essential linguistic knowledge (e.g., semantics and
syntax) can be extracted during VLP, and 2) how such linguistic knowledge
impact or enhance the multimodal alignment. In response, here we aim to
elucidate the impact of comprehensive linguistic knowledge, including semantic
expression and syntactic structure, on multimodal alignment. Specifically, we
design and release the SNARE, the first large-scale multimodal alignment
probing benchmark, to detect the vital linguistic components, e.g., lexical,
semantic, and syntax knowledge, containing four tasks: Semantic structure,
Negation logic, Attribute ownership, and Relationship composition. Based on our
proposed probing benchmarks, our holistic analyses of five advanced VLP models
illustrate that the VLP model: i) shows insensitivity towards complex syntax
structures and relies on content words for sentence comprehension; ii)
demonstrates limited comprehension of combinations between sentences and
negations; iii) faces challenges in determining the presence of actions or
spatial relationships within visual information and struggles with verifying
the correctness of triple combinations. We make our benchmark and code
available at \url{https://github.com/WangFei-2019/SNARE/}.Comment: [TL;DR] we design and release the SNARE, the first large-scale
multimodal alignment probing benchmark for current vision-language pretrained
model
RasiowaâSikorski deduction systems in computer science applications
AbstractA Rasiowa-Sikorski system is a sequence-type formalization of logics. The system uses invertible decomposition rules which decompose a formula into sequences of simpler formulae whose validity is equivalent to validity of the original formula. There may also be expansion rules which close indecomposable sequences under certain properties of relations appearing in the formulae, like symmetry or transitivity. Proofs are finite decomposition trees with leaves having âfundamentalâ, valid labels. The author describes a general method of applying the R-S formalism to develop complete deduction systems for various brands of C.S and A.I. logic, including a logic for reasoning about relative similarity, a three-valued software specification logic with McCarthy's connectives and Kleene quantifiers, a logic for nondeterministic specifications, many-sorted FOL with possibly empty carriers of some sorts, and a three-valued logic for reasoning about concurrency
The discourse structure of video games:A multimodal discourse semantics approach to game tutorials
The article proposes a multimodal discourse semantics approach to the analysis of video game tutorials that provides a discourse pragmatic analysis of the game canvases in these tutorials. The study mainly builds on linguistic approaches to formal dynamic discourse semantics that have already been successfully applied to other multimodal artefacts. The article will showcase the application of the resulting âlogic of multimodal discourse interpretationâ to two specific cases of video game tutorials. This will outline particular discourse relations holding between events and segments in the tutorials as distinctive features of this video game genre and show the discursive patterns of these instructions
Epic Stories: Sequence Fiction, Young Readers, And The Aesthetics Of World Building
This study theorizes the world building processes that sequence fiction engages within a framework of intratextual structuralism and cognitive aesthetic stage theory. The study begins with an interdisciplinary overview of fictional and possible worlds theory before proposing a structural adaptation of this lens that explains the developmental, aesthetic benefits of the genre for young readers. Chapter II is an application of the adapted lens to a canonical epic, the His Dark Materials sequence by Philip Pullman. I interpret the intentional structure of the story world across novels to discuss how these engage readers at different aesthetic milestones and encourage a deeper imaginative construct as a result. Chapter III is a similar application of the proposed theory for the popular television story world: Nickelodeonâs animated epic, The Last Airbender by Michael DiMartino and Bryan Konietzko. The examination of this story world includes a discussion of how media and different forms of literacy disrupt and encourage specific aesthetic responses to a story world. The final chapter begins with an observational discussion of my two children and their experiences engaging with fictional worlds. My analysis of their responses to a popular sequence proposes the children have an intuitive reading process that revolves around play and multimodal engagement with fiction that enhances the internalization of a story world. The chapter concludes with a discussion of how similar methods in an adult classroom can benefit adult students that struggle with reading engagement
Intelligent Feature Extraction, Data Fusion and Detection of Concrete Bridge Cracks: Current Development and Challenges
As a common appearance defect of concrete bridges, cracks are important
indices for bridge structure health assessment. Although there has been much
research on crack identification, research on the evolution mechanism of bridge
cracks is still far from practical applications. In this paper, the
state-of-the-art research on intelligent theories and methodologies for
intelligent feature extraction, data fusion and crack detection based on
data-driven approaches is comprehensively reviewed. The research is discussed
from three aspects: the feature extraction level of the multimodal parameters
of bridge cracks, the description level and the diagnosis level of the bridge
crack damage states. We focus on previous research concerning the quantitative
characterization problems of multimodal parameters of bridge cracks and their
implementation in crack identification, while highlighting some of their major
drawbacks. In addition, the current challenges and potential future research
directions are discussed.Comment: Published at Intelligence & Robotics; Its copyright belongs to
author
- âŠ