530 research outputs found
Framework for Enhanced Ontology Alignment using BERT-Based
This framework combines a few approaches to improve ontology alignment by using the data mining method with BERT. The method utilizes data mining techniques to identify the optimal characteristics for picking the data attributes of instances to match ontologies. Furthermore, this framework was developed to improve current precision and recall measures for ontology matching techniques. Since knowledge integration began, the main requirement for ontology alignment has always been syntactic and structural matching. This article presents a new approach that employs advanced methods like data mining and BERT embeddings to produce more expansive and contextually aware ontology alignment. The proposed system exploits contextual representation of BERT, semantic understanding, feature extraction, and pattern recognition through data mining techniques. The objective is to combine data-driven insights with semantic representation advantages to enhance accuracy and efficiency in the ontology alignment process. The evaluation conducted using annotated datasets as well as traditional approaches demonstrates how effective and adaptable, according to domains, our proposed framework is across several domains
Knowledge Extraction from Textual Resources through Semantic Web Tools and Advanced Machine Learning Algorithms for Applications in Various Domains
Nowadays there is a tremendous amount of unstructured data, often represented by texts, which is created and stored in variety of forms in many domains such as patients' health records, social networks comments, scientific publications, and so on. This volume of data represents an invaluable source of knowledge, but unfortunately it is challenging its mining for machines. At the same time, novel tools as well as advanced methodologies have been introduced in several domains, improving the efficacy and the efficiency of data-based services.
Following this trend, this thesis shows how to parse data from text with Semantic Web based tools, feed data into Machine Learning methodologies, and produce services or resources to facilitate the execution of some tasks. More precisely, the use of Semantic Web technologies powered by Machine Learning algorithms has been investigated in the Healthcare and E-Learning domains through not yet experimented methodologies. Furthermore, this thesis investigates the use of some state-of-the-art tools to move data from texts to graphs for representing the knowledge contained in scientific literature. Finally, the use of a Semantic Web ontology and novel heuristics to detect insights from biological data in form of graph are presented. The thesis contributes to the scientific literature in terms of results and resources. Most of the material presented in this thesis derives from research papers published in international journals or conference proceedings
BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights
In this study, we investigate the potential of Large Language Models to
complement biomedical knowledge graphs in the training of semantic models for
the biomedical and clinical domains. Drawing on the wealth of the UMLS
knowledge graph and harnessing cutting-edge Large Language Models, we propose a
new state-of-the-art approach for obtaining high-fidelity representations of
biomedical concepts and sentences, consisting of three steps: an improved
contrastive learning phase, a novel self-distillation phase, and a weight
averaging phase. Through rigorous evaluations via the extensive BioLORD testing
suite and diverse downstream tasks, we demonstrate consistent and substantial
performance improvements over the previous state of the art (e.g. +2pts on
MedSTS, +2.5pts on MedNLI-S, +6.1pts on EHR-Rel-B). Besides our new
state-of-the-art biomedical model for English, we also distill and release a
multilingual model compatible with 50+ languages and finetuned on 7 European
languages. Many clinical pipelines can benefit from our latest models. Our new
multilingual model enables a range of languages to benefit from our
advancements in biomedical semantic representation learning, opening a new
avenue for bioinformatics researchers around the world. As a result, we hope to
see BioLORD-2023 becoming a precious tool for future biomedical applications.Comment: Preprint of upcoming journal articl
Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation
Affordance detection presents intricate challenges and has a wide range of
robotic applications. Previous works have faced limitations such as the
complexities of 3D object shapes, the wide range of potential affordances on
real-world objects, and the lack of open-vocabulary support for affordance
understanding. In this paper, we introduce a new open-vocabulary affordance
detection method in 3D point clouds, leveraging knowledge distillation and
text-point correlation. Our approach employs pre-trained 3D models through
knowledge distillation to enhance feature extraction and semantic understanding
in 3D point clouds. We further introduce a new text-point correlation method to
learn the semantic links between point cloud features and open-vocabulary
labels. The intensive experiments show that our approach outperforms previous
works and adapts to new affordance labels and unseen objects. Notably, our
method achieves the improvement of 7.96% mIOU score compared to the baselines.
Furthermore, it offers real-time inference which is well-suitable for robotic
manipulation applications.Comment: 8 page
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
Knowledge Graphs (KGs) play a pivotal role in advancing various AI
applications, with the semantic web community's exploration into multi-modal
dimensions unlocking new avenues for innovation. In this survey, we carefully
review over 300 articles, focusing on KG-aware research in two principal
aspects: KG-driven Multi-Modal (KG4MM) learning, where KGs support multi-modal
tasks, and Multi-Modal Knowledge Graph (MM4KG), which extends KG studies into
the MMKG realm. We begin by defining KGs and MMKGs, then explore their
construction progress. Our review includes two primary task categories:
KG-aware multi-modal learning tasks, such as Image Classification and Visual
Question Answering, and intrinsic MMKG tasks like Multi-modal Knowledge Graph
Completion and Entity Alignment, highlighting specific research trajectories.
For most of these tasks, we provide definitions, evaluation benchmarks, and
additionally outline essential insights for conducting relevant research.
Finally, we discuss current challenges and identify emerging trends, such as
progress in Large Language Modeling and Multi-modal Pre-training strategies.
This survey aims to serve as a comprehensive reference for researchers already
involved in or considering delving into KG and multi-modal learning research,
offering insights into the evolving landscape of MMKG research and supporting
future work.Comment: Ongoing work; 41 pages (Main Text), 55 pages (Total), 11 Tables, 13
Figures, 619 citations; Paper list is available at
https://github.com/zjukg/KG-MM-Surve
A survey on knowledge-enhanced multimodal learning
Multimodal learning has been a field of increasing interest, aiming to
combine various modalities in a single joint representation. Especially in the
area of visiolinguistic (VL) learning multiple models and techniques have been
developed, targeting a variety of tasks that involve images and text. VL models
have reached unprecedented performances by extending the idea of Transformers,
so that both modalities can learn from each other. Massive pre-training
procedures enable VL models to acquire a certain level of real-world
understanding, although many gaps can be identified: the limited comprehension
of commonsense, factual, temporal and other everyday knowledge aspects
questions the extendability of VL tasks. Knowledge graphs and other knowledge
sources can fill those gaps by explicitly providing missing information,
unlocking novel capabilities of VL models. In the same time, knowledge graphs
enhance explainability, fairness and validity of decision making, issues of
outermost importance for such complex implementations. The current survey aims
to unify the fields of VL representation learning and knowledge graphs, and
provides a taxonomy and analysis of knowledge-enhanced VL models
NLP-Based Techniques for Cyber Threat Intelligence
In the digital era, threat actors employ sophisticated techniques for which,
often, digital traces in the form of textual data are available. Cyber Threat
Intelligence~(CTI) is related to all the solutions inherent to data collection,
processing, and analysis useful to understand a threat actor's targets and
attack behavior. Currently, CTI is assuming an always more crucial role in
identifying and mitigating threats and enabling proactive defense strategies.
In this context, NLP, an artificial intelligence branch, has emerged as a
powerful tool for enhancing threat intelligence capabilities. This survey paper
provides a comprehensive overview of NLP-based techniques applied in the
context of threat intelligence. It begins by describing the foundational
definitions and principles of CTI as a major tool for safeguarding digital
assets. It then undertakes a thorough examination of NLP-based techniques for
CTI data crawling from Web sources, CTI data analysis, Relation Extraction from
cybersecurity data, CTI sharing and collaboration, and security threats of CTI.
Finally, the challenges and limitations of NLP in threat intelligence are
exhaustively examined, including data quality issues and ethical
considerations. This survey draws a complete framework and serves as a valuable
resource for security professionals and researchers seeking to understand the
state-of-the-art NLP-based threat intelligence techniques and their potential
impact on cybersecurity
Progress and Opportunities of Foundation Models in Bioinformatics
Bioinformatics has witnessed a paradigm shift with the increasing integration
of artificial intelligence (AI), particularly through the adoption of
foundation models (FMs). These AI techniques have rapidly advanced, addressing
historical challenges in bioinformatics such as the scarcity of annotated data
and the presence of data noise. FMs are particularly adept at handling
large-scale, unlabeled data, a common scenario in biological contexts due to
the time-consuming and costly nature of experimentally determining labeled
data. This characteristic has allowed FMs to excel and achieve notable results
in various downstream validation tasks, demonstrating their ability to
represent diverse biological entities effectively. Undoubtedly, FMs have
ushered in a new era in computational biology, especially in the realm of deep
learning. The primary goal of this survey is to conduct a systematic
investigation and summary of FMs in bioinformatics, tracing their evolution,
current research status, and the methodologies employed. Central to our focus
is the application of FMs to specific biological problems, aiming to guide the
research community in choosing appropriate FMs for their research needs. We
delve into the specifics of the problem at hand including sequence analysis,
structure prediction, function annotation, and multimodal integration,
comparing the structures and advancements against traditional methods.
Furthermore, the review analyses challenges and limitations faced by FMs in
biology, such as data noise, model explainability, and potential biases.
Finally, we outline potential development paths and strategies for FMs in
future biological research, setting the stage for continued innovation and
application in this rapidly evolving field. This comprehensive review serves
not only as an academic resource but also as a roadmap for future explorations
and applications of FMs in biology.Comment: 27 pages, 3 figures, 2 table
- …