325 research outputs found
Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models
In this paper, we propose a table and image generation task to verify how the
knowledge about entities acquired from natural language is retained in Vision &
Language (V & L) models. This task consists of two parts: the first is to
generate a table containing knowledge about an entity and its related image,
and the second is to generate an image from an entity with a caption and a
table containing related knowledge of the entity. In both tasks, the model must
know the entities used to perform the generation properly. We created the
Wikipedia Table and Image Generation (WikiTIG) dataset from about 200,000
infoboxes in English Wikipedia articles to perform the proposed tasks. We
evaluated the performance on the tasks with respect to the above research
question using the V & L model OFA, which has achieved state-of-the-art results
in multiple tasks. Experimental results show that OFA forgets part of its
entity knowledge by pre-training as a complement to improve the performance of
image related tasks.Comment: Accepted at ACL 202
Recommended from our members
ERG-associated protein with SET domain (ESET)-Oct4 interaction regulates pluripotency and represses the trophectoderm lineage.
BACKGROUND: Pluripotency, the capacity for indefinite self-renewal and differentiation into diverse cell types is a unique state exhibited by embryonic stem (ES) cells. Transcriptional regulators, such as Oct4, are critical for pluripotency, but the role of epigenetic modifiers remains to be fully elucidated. RESULTS: Here, we show that ERG-associated protein with SET domain (ESET), a histone methyltransferase enzyme, maintains pluripotency through repression of Cdx2, a key trophectoderm determinant, by histone H3 lysine 9 trimethylation (H3K9me3) of the promoter region. Notably, this repression is mediated through the synergistic function of small ubiquitin-related modifier (SUMO)ylated ESET and Oct4. ESET localises to the promyelocytic leukaemia (PML) nuclear bodies and is SUMOylated in ES cells. Interaction of ESET with Oct4 depends on a SUMO-interacting motif (SIM) in Oct4, which is critical for the repression of Cdx2. CONCLUSION: Loss of ESET or Oct4 results in strikingly similar phenotypes both in ES cells with their differentiation into trophectoderm cells, and in early embryos where there is a failure of development of the pluripotent inner cell mass (ICM) of blastocysts. We propose that SUMOylated ESET-Oct4 complex is critical for both the initiation and maintenance of pluripotency through repression of differentiation, particularly of the trophectoderm lineage by epigenetic silencing of Cdx2.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are
Model-based Subsampling for Knowledge Graph Completion
Subsampling is effective in Knowledge Graph Embedding (KGE) for reducing
overfitting caused by the sparsity in Knowledge Graph (KG) datasets. However,
current subsampling approaches consider only frequencies of queries that
consist of entities and their relations. Thus, the existing subsampling
potentially underestimates the appearance probabilities of infrequent queries
even if the frequencies of their entities or relations are high. To address
this problem, we propose Model-based Subsampling (MBS) and Mixed Subsampling
(MIX) to estimate their appearance probabilities through predictions of KGE
models. Evaluation results on datasets FB15k-237, WN18RR, and YAGO3-10 showed
that our proposed subsampling methods actually improved the KG completion
performances for popular KGE models, RotatE, TransE, HAKE, ComplEx, and
DistMult.Comment: Accepted by AACL 2023; 9 pages, 3 figures, 5 table
Does Pre-trained Language Model Actually Infer Unseen Links in Knowledge Graph Completion?
Knowledge graphs (KGs) consist of links that describe relationships between
entities. Due to the difficulty of manually enumerating all relationships
between entities, automatically completing them is essential for KGs. Knowledge
Graph Completion (KGC) is a task that infers unseen relationships between
entities in a KG. Traditional embedding-based KGC methods, such as RESCAL,
TransE, DistMult, ComplEx, RotatE, HAKE, HousE, etc., infer missing links using
only the knowledge from training data. In contrast, the recent Pre-trained
Language Model (PLM)-based KGC utilizes knowledge obtained during pre-training.
Therefore, PLM-based KGC can estimate missing links between entities by reusing
memorized knowledge from pre-training without inference. This approach is
problematic because building KGC models aims to infer unseen links between
entities. However, conventional evaluations in KGC do not consider inference
and memorization abilities separately. Thus, a PLM-based KGC method, which
achieves high performance in current KGC evaluations, may be ineffective in
practical applications. To address this issue, we analyze whether PLM-based KGC
methods make inferences or merely access memorized knowledge. For this purpose,
we propose a method for constructing synthetic datasets specified in this
analysis and conclude that PLMs acquire the inference abilities required for
KGC through pre-training, even though the performance improvements mostly come
from textual information of entities and relations.Comment: 15 pages, 10 figure
- …