2,023 research outputs found
Graph-Embedding Empowered Entity Retrieval
In this research, we improve upon the current state of the art in entity
retrieval by re-ranking the result list using graph embeddings. The paper shows
that graph embeddings are useful for entity-oriented search tasks. We
demonstrate empirically that encoding information from the knowledge graph into
(graph) embeddings contributes to a higher increase in effectiveness of entity
retrieval results than using plain word embeddings. We analyze the impact of
the accuracy of the entity linker on the overall retrieval effectiveness. Our
analysis further deploys the cluster hypothesis to explain the observed
advantages of graph embeddings over the more widely used word embeddings, for
user tasks involving ranking entities
Inductive Entity Representations from Text via Link Prediction
Knowledge Graphs (KG) are of vital importance for multiple applications on
the web, including information retrieval, recommender systems, and metadata
annotation. Regardless of whether they are built manually by domain experts or
with automatic pipelines, KGs are often incomplete. Recent work has begun to
explore the use of textual descriptions available in knowledge graphs to learn
vector representations of entities in order to preform link prediction.
However, the extent to which these representations learned for link prediction
generalize to other tasks is unclear. This is important given the cost of
learning such representations. Ideally, we would prefer representations that do
not need to be trained again when transferring to a different task, while
retaining reasonable performance.
In this work, we propose a holistic evaluation protocol for entity
representations learned via a link prediction objective. We consider the
inductive link prediction and entity classification tasks, which involve
entities not seen during training. We also consider an information retrieval
task for entity-oriented search. We evaluate an architecture based on a
pretrained language model, that exhibits strong generalization to entities not
observed during training, and outperforms related state-of-the-art methods (22%
MRR improvement in link prediction on average). We further provide evidence
that the learned representations transfer well to other tasks without
fine-tuning. In the entity classification task we obtain an average improvement
of 16% in accuracy compared with baselines that also employ pre-trained models.
In the information retrieval task, we obtain significant improvements of up to
8.8% in NDCG@10 for natural language queries. We thus show that the learned
representations are not limited KG-specific tasks, and have greater
generalization properties than evaluated in previous work
Unifying Large Language Models and Knowledge Graphs: A Roadmap
Large language models (LLMs), such as ChatGPT and GPT4, are making new waves
in the field of natural language processing and artificial intelligence, due to
their emergent ability and generalizability. However, LLMs are black-box
models, which often fall short of capturing and accessing factual knowledge. In
contrast, Knowledge Graphs (KGs), Wikipedia and Huapu for example, are
structured knowledge models that explicitly store rich factual knowledge. KGs
can enhance LLMs by providing external knowledge for inference and
interpretability. Meanwhile, KGs are difficult to construct and evolving by
nature, which challenges the existing methods in KGs to generate new facts and
represent unseen knowledge. Therefore, it is complementary to unify LLMs and
KGs together and simultaneously leverage their advantages. In this article, we
present a forward-looking roadmap for the unification of LLMs and KGs. Our
roadmap consists of three general frameworks, namely, 1) KG-enhanced LLMs,
which incorporate KGs during the pre-training and inference phases of LLMs, or
for the purpose of enhancing understanding of the knowledge learned by LLMs; 2)
LLM-augmented KGs, that leverage LLMs for different KG tasks such as embedding,
completion, construction, graph-to-text generation, and question answering; and
3) Synergized LLMs + KGs, in which LLMs and KGs play equal roles and work in a
mutually beneficial way to enhance both LLMs and KGs for bidirectional
reasoning driven by both data and knowledge. We review and summarize existing
efforts within these three frameworks in our roadmap and pinpoint their future
research directions.Comment: 29 pages, 25 figure
UKnow: A Unified Knowledge Protocol for Common-Sense Reasoning and Vision-Language Pre-training
This work presents a unified knowledge protocol, called UKnow, which
facilitates knowledge-based studies from the perspective of data. Particularly
focusing on visual and linguistic modalities, we categorize data knowledge into
five unit types, namely, in-image, in-text, cross-image, cross-text, and
image-text. Following this protocol, we collect, from public international
news, a large-scale multimodal knowledge graph dataset that consists of
1,388,568 nodes (with 571,791 vision-related ones) and 3,673,817 triplets. The
dataset is also annotated with rich event tags, including 96 coarse labels and
9,185 fine labels, expanding its potential usage. To further verify that UKnow
can serve as a standard protocol, we set up an efficient pipeline to help
reorganize existing datasets under UKnow format. Finally, we benchmark the
performance of some widely-used baselines on the tasks of common-sense
reasoning and vision-language pre-training. Results on both our new dataset and
the reformatted public datasets demonstrate the effectiveness of UKnow in
knowledge organization and method evaluation. Code, dataset, conversion tool,
and baseline models will be made public
ISTRAŽIVANJE O POVEZIVANJU ENTITETA ZA SPECIFIÄNE DOMENE S HETEROGENIM INFORMACIJSKIM MREŽAMA
Entity linking is a task of extracting information that links the mentioned entity in a collection of text with their similar knowledge base as well as it is the task of allocating unique identity to various entities such as locations, individuals and companies. Knowledgebase (KB) is used to optimize the information collection, organization and for retrieval of information. Heterogeneous information networks (HIN) comprises multiple-type interlinked objects with various types of relationship which are becoming increasingly most popular named bibliographic networks, social media networks as well including the typical relational database data. In HIN, there are various data objects are interconnected through various relations. The entity linkage determines the corresponding entities from unstructured web text, in the existing HIN. This work is the most important and it is the most challenge because of ambiguity and existing limited knowledge. Some HIN could be considered as a domain-specific KB. The current Entity Linking (EL) systems aimed towards corpora which contain heterogeneous as web information and it performs sub-optimally on the domain-specific corpora. The EL systems used one or more general or specific domains of linking such as DBpedia, Wikipedia, Freebase, IMDB, YAGO, Wordnet and MKB. This paper presents a survey on domain-specific entity linking with HIN. This survey describes with a deep understanding of HIN, which includes datasets,types and examples with related concepts.Povezivanje entiteta je zadatak izvlaÄenja podataka koji povezuju spomenuti entitet u zbirci teksta sa njihovom sliÄnom bazom znanja, kao i zadatak dodjeljivanja jedinstvenog identiteta razliÄitim entitetima, kao Å”to su lokacije, pojedinci i tvrtke. Baza znanja (BZ) koristi se za optimizaciju prikupljanja, organizacije i pronalaženja informacija. Heterogene mreže informacija (HMI) obuhvaÄaju viÅ”estruke meÄusobno povezane objekte razliÄitih vrsta odnosa koji postaju sve popularniji i nazivaju se bibliografskim mrežama, mrežama druÅ”tvenih medija, ukljuÄujuÄi tipiÄne podatke relacijske baze podataka. U HMI-u postoje razni podaci koji su meÄusobno povezani kroz razliÄite odnose. Povezanost entiteta odreÄuje odgovarajuÄe entitete iz nestrukturiranog teksta na webu u postojeÄem HMI-u. Ovaj je rad najvažniji i najveÄi izazov zbog nejasnoÄe i postojeÄeg ograniÄenog znanja. Neki se HMI mogu smatrati BZ-om specifiÄnim za domenu. Trenutni sustav povezivanja entiteta (PE) usmjeren je prema korpusima koji sadrže heterogene informacije kao web informacije i oni djeluju suptimalno na korpusima specifiÄnim za domenu. PE sustavi koristili su jednu ili viÅ”e opÄih ili specifiÄnih domena povezivanja, kao Å”to su DBpedia, Wikipedia, Freebase, IMDB, YAGO, Wordnet i MKB. U ovom radu predstavljeno je istraživanje o povezivanju entiteta specifiÄnog za domenu sa HMI-om. Ovo istraživanje opisuje s dubokim razumijevanjem HMI-a, Å”to ukljuÄuje skupove podataka, vrste i primjere s povezanim konceptima
MolFM: A Multimodal Molecular Foundation Model
Molecular knowledge resides within three different modalities of information
sources: molecular structures, biomedical documents, and knowledge bases.
Effective incorporation of molecular knowledge from these modalities holds
paramount significance in facilitating biomedical research. However, existing
multimodal molecular foundation models exhibit limitations in capturing
intricate connections between molecular structures and texts, and more
importantly, none of them attempt to leverage a wealth of molecular expertise
derived from knowledge graphs. In this study, we introduce MolFM, a multimodal
molecular foundation model designed to facilitate joint representation learning
from molecular structures, biomedical texts, and knowledge graphs. We propose
cross-modal attention between atoms of molecular structures, neighbors of
molecule entities and semantically related texts to facilitate cross-modal
comprehension. We provide theoretical analysis that our cross-modal
pre-training captures local and global molecular knowledge by minimizing the
distance in the feature space between different modalities of the same
molecule, as well as molecules sharing similar structures or functions. MolFM
achieves state-of-the-art performance on various downstream tasks. On
cross-modal retrieval, MolFM outperforms existing models with 12.13% and 5.04%
absolute gains under the zero-shot and fine-tuning settings, respectively.
Furthermore, qualitative analysis showcases MolFM's implicit ability to provide
grounding from molecular substructures and knowledge graphs. Code and models
are available on https://github.com/BioFM/OpenBioMed.Comment: 31 pages, 15 figures, and 15 table
- ā¦