Search CORE

3,276 research outputs found

VTKG: A Vision Transformer Model with Integration of Knowledge Graph for Enhanced Image Captioning

Author: Yugandhara A. Thakare et al.
Publication venue: Auricle Global Society of Education and Research
Publication date: 30/10/2023
Field of study

The Transformer model has exhibited impressive results in machine translation tasks. In this research, we utilize the Transformer model to improve the performance of image captioning. In this paper, we tackle the image captioning task from a novel sequence-to-sequence perspective and present VTKG, a VisionTransformer model with integrated Knowledge Graph, a comprehensive Transformer network that substitutes the CNN in the encoder section with a convolution-free Transformer encoder. Subsequently, to enhance the generation of meaningful captions and address the issue of mispredictions, we introduce a novel approach to integrate common-sense knowledge extracted from a knowledge graph. This has significantly improved the overall adaptability of our captioning model. Through the amalgamation of the previously mentioned strategies, we attain exceptional performance on multiple established evaluation metrics, outperforming existing benchmarks. Experimental results demonstrate a 1.32%, 1.7%, 1.25%, 1.14%, 2.8% and 2.5% improvement in Blue-1, Bluu-2, Blue-4, Metor, Rough-L and CIDEr score respectively when compared to state-of-the-art methods

International Journal on Recent and Innovation Trends in Computing and Communication

Towards automated knowledge-based mapping between individual conceptualisations to empower personalisation of Geospatial Semantic Web

Author: Agarwal Pragya
Dimitrova Vania
Huang Yongjian
Publication venue
Publication date: 01/01/2005
Field of study

Geospatial domain is characterised by vagueness, especially in the semantic disambiguation of the concepts in the domain, which makes defining universally accepted geo- ontology an onerous task. This is compounded by the lack of appropriate methods and techniques where the individual semantic conceptualisations can be captured and compared to each other. With multiple user conceptualisations, efforts towards a reliable Geospatial Semantic Web, therefore, require personalisation where user diversity can be incorporated. The work presented in this paper is part of our ongoing research on applying commonsense reasoning to elicit and maintain models that represent users' conceptualisations. Such user models will enable taking into account the users' perspective of the real world and will empower personalisation algorithms for the Semantic Web. Intelligent information processing over the Semantic Web can be achieved if different conceptualisations can be integrated in a semantic environment and mismatches between different conceptualisations can be outlined. In this paper, a formal approach for detecting mismatches between a user's and an expert's conceptual model is outlined. The formalisation is used as the basis to develop algorithms to compare models defined in OWL. The algorithms are illustrated in a geographical domain using concepts from the SPACE ontology developed as part of the SWEET suite of ontologies for the Semantic Web by NASA, and are evaluated by comparing test cases of possible user misconceptions

Southampton (e-Prints Soton)

Detecting Mismatches between a User's and an Expert's Conceptualisations

Author: Agarwal Pragya
Dimtrova Vania
Huang Yongjian
Publication venue
Publication date: 01/01/2005
Field of study

The work presented in this paper is part of our ongoing research on applying commonsense reasoning to elicit and maintain models that represent users' conceptualisations. Such user models will enable taking into account the users' perspective of the world and will empower personalisation algorithms for the Semantic Web. A formal approach for detecting mismatches between a user's and an expert's conceptual model is outlined. The formalisation is used as the basis to develop algorithms to compare two conceptualisations defined in OWL. The algorithms are illustrated in a geographical domain using a space ontology developed at NASA, and have been tested by simulating possible user misconceptions

Southampton (e-Prints Soton)

A survey on knowledge-enhanced multimodal learning

Author: Lymperaiou Maria
Stamou Giorgos
Publication venue
Publication date: 19/11/2022
Field of study

Multimodal learning has been a field of increasing interest, aiming to combine various modalities in a single joint representation. Especially in the area of visiolinguistic (VL) learning multiple models and techniques have been developed, targeting a variety of tasks that involve images and text. VL models have reached unprecedented performances by extending the idea of Transformers, so that both modalities can learn from each other. Massive pre-training procedures enable VL models to acquire a certain level of real-world understanding, although many gaps can be identified: the limited comprehension of commonsense, factual, temporal and other everyday knowledge aspects questions the extendability of VL tasks. Knowledge graphs and other knowledge sources can fill those gaps by explicitly providing missing information, unlocking novel capabilities of VL models. In the same time, knowledge graphs enhance explainability, fairness and validity of decision making, issues of outermost importance for such complex implementations. The current survey aims to unify the fields of VL representation learning and knowledge graphs, and provides a taxonomy and analysis of knowledge-enhanced VL models

arXiv.org e-Print Archive

Commonsense for Zero-Shot Natural Language Video Localization

Author: Holla Meghana
Lourentzou Ismini
Publication venue
Publication date: 31/01/2024
Field of study

Zero-shot Natural Language-Video Localization (NLVL) methods have exhibited promising results in training NLVL models exclusively with raw video data by dynamically generating video segments and pseudo-query annotations. However, existing pseudo-queries often lack grounding in the source video, resulting in unstructured and disjointed content. In this paper, we investigate the effectiveness of commonsense reasoning in zero-shot NLVL. Specifically, we present CORONET, a zero-shot NLVL framework that leverages commonsense to bridge the gap between videos and generated pseudo-queries via a commonsense enhancement module. CORONET employs Graph Convolution Networks (GCN) to encode commonsense information extracted from a knowledge graph, conditioned on the video, and cross-attention mechanisms to enhance the encoded video and pseudo-query representations prior to localization. Through empirical evaluations on two benchmark datasets, we demonstrate that CORONET surpasses both zero-shot and weakly supervised baselines, achieving improvements up to 32.13% across various recall thresholds and up to 6.33% in mIoU. These results underscore the significance of leveraging commonsense reasoning for zero-shot NLVL.Comment: Accepted to AAAI 202

arXiv.org e-Print Archive

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

Author: Chen Huajun
Chen Jiaoyan
Chen Xiang
Chen Zhuo
Fang Yin
Geng Yuxia
Guo Lingbing
Li Jiaqi
Li Qian
Liu Xiaoze
Pan Jeff Z.
Zhang Ningyu
Zhang Wen
Zhang Yichi
Zhu Yushan
Publication venue
Publication date: 26/02/2024
Field of study

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the semantic web community's exploration into multi-modal dimensions unlocking new avenues for innovation. In this survey, we carefully review over 300 articles, focusing on KG-aware research in two principal aspects: KG-driven Multi-Modal (KG4MM) learning, where KGs support multi-modal tasks, and Multi-Modal Knowledge Graph (MM4KG), which extends KG studies into the MMKG realm. We begin by defining KGs and MMKGs, then explore their construction progress. Our review includes two primary task categories: KG-aware multi-modal learning tasks, such as Image Classification and Visual Question Answering, and intrinsic MMKG tasks like Multi-modal Knowledge Graph Completion and Entity Alignment, highlighting specific research trajectories. For most of these tasks, we provide definitions, evaluation benchmarks, and additionally outline essential insights for conducting relevant research. Finally, we discuss current challenges and identify emerging trends, such as progress in Large Language Modeling and Multi-modal Pre-training strategies. This survey aims to serve as a comprehensive reference for researchers already involved in or considering delving into KG and multi-modal learning research, offering insights into the evolving landscape of MMKG research and supporting future work.Comment: Ongoing work; 41 pages (Main Text), 55 pages (Total), 11 Tables, 13 Figures, 619 citations; Paper list is available at https://github.com/zjukg/KG-MM-Surve

arXiv.org e-Print Archive

Recommended from our members

Sensory semantic user interfaces (SenSUI)

Author: Bell D
Heravi BR
Lycett M
Publication venue: CEUR-WS
Publication date: 01/01/2009
Field of study

Rapid evolution of the World Wide Web with its underlying sources of data, knowledge, services and applications continually attempts to support a variety of users, with different backgrounds, requirements and capabilities. In such an environment, it is highly unlikely that a single user interface will prevail and be able to fulfill the requirements of each user adequately. Adaptive user interfaces are able to adapt information and application functionalities to the user context. In contrast, pervasive computing and sensor networks open new opportunities for context aware platforms, one that is able to improve user interface adaptation reacting to environmental and user sensors. Semantic web technologies and ontologies are able to capture sensor data and provide contextual information about the user, their actions, required applications and environment. This paper investigates the viability of an approach where semantic web technologies are used to maximize the efficacy of interface adaptation through the use of available ontology

Brunel University Research Archive