Search CORE

170 research outputs found

CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

Author: Jain Prince
Talukdar Partha
Vashishth Shikhar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/01/2019
Field of study

Open Information Extraction (OpenIE) methods extract (noun phrase, relation phrase, noun phrase) triples from text, resulting in the construction of large Open Knowledge Bases (Open KBs). The noun phrases (NPs) and relation phrases in such Open KBs are not canonicalized, leading to the storage of redundant and ambiguous facts. Recent research has posed canonicalization of Open KBs as clustering over manuallydefined feature spaces. Manual feature engineering is expensive and often sub-optimal. In order to overcome this challenge, we propose Canonicalization using Embeddings and Side Information (CESI) - a novel approach which performs canonicalization over learned embeddings of Open KBs. CESI extends recent advances in KB embedding by incorporating relevant NP and relation phrase side information in a principled manner. Through extensive experiments on multiple real-world datasets, we demonstrate CESI's effectiveness.Comment: Accepted at WWW 201

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications

Open Knowledge Base Canonicalization with Multi-task Unlearning

Author: Hou Shihao
Liu Bingchen
Liu Shijun
Pan Li
Zeng Weixin
Zhao Xiang
Publication venue
Publication date: 25/10/2023
Field of study

The construction of large open knowledge bases (OKBs) is integral to many applications in the field of mobile computing. Noun phrases and relational phrases in OKBs often suffer from redundancy and ambiguity, which calls for the investigation on OKB canonicalization. However, in order to meet the requirements of some privacy protection regulations and to ensure the timeliness of the data, the canonicalized OKB often needs to remove some sensitive information or outdated data. The machine unlearning in OKB canonicalization is an excellent solution to the above problem. Current solutions address OKB canonicalization by devising advanced clustering algorithms and using knowledge graph embedding (KGE) to further facilitate the canonicalization process. Effective schemes are urgently needed to fully synergise machine unlearning with clustering and KGE learning. To this end, we put forward a multi-task unlearning framework, namely MulCanon, to tackle machine unlearning problem in OKB canonicalization. Specifically, the noise characteristics in the diffusion model are utilized to achieve the effect of machine unlearning for data in OKB. MulCanon unifies the learning objectives of diffusion model, KGE and clustering algorithms, and adopts a two-step multi-task learning paradigm for training. A thorough experimental study on popular OKB canonicalization datasets validates that MulCanon achieves advanced machine unlearning effects

arXiv.org e-Print Archive

Open Information Extraction: A Review of Baseline Techniques, Approaches, and Applications

Author: Benameur-El Zineb
Dreslinski Ronald
Fayazi Morteza
Kamp Serafina
Yu Shuyan
Publication venue
Publication date: 17/10/2023
Field of study

With the abundant amount of available online and offline text data, there arises a crucial need to extract the relation between phrases and summarize the main content of each document in a few words. For this purpose, there have been many studies recently in Open Information Extraction (OIE). OIE improves upon relation extraction techniques by analyzing relations across different domains and avoids requiring hand-labeling pre-specified relations in sentences. This paper surveys recent approaches of OIE and its applications on Knowledge Graph (KG), text summarization, and Question Answering (QA). Moreover, the paper describes OIE basis methods in relation extraction. It briefly discusses the main approaches and the pros and cons of each method. Finally, it gives an overview about challenges, open issues, and future work opportunities for OIE, relation extraction, and OIE applications.Comment: 15 pages, 9 figure

arXiv.org e-Print Archive

Correcting Knowledge Base Assertions

Author: Arndt Dörthe
Auer Sören
Chen Jiaoyan
De Melo Gerard
Dimou Anastasia
Lertvittayakumjorn Piyawat
Melo André
Niklaus Christina
Omran Pouya Ghiasnezhad
Trouillon Théo
Vrandečić Denny
Zhang Wen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

The usefulness and usability of knowledge bases (KBs) is often limited by quality issues. One common issue is the presence of erroneous assertions, often caused by lexical or semantic confusion. We study the problem of correcting such assertions, and present a general correction framework which combines lexical matching, semantic embedding, soft constraint mining and semantic consistency checking. The framework is evaluated using DBpedia and an enterprise medical KB

arXiv.org e-Print Archive

City Research Online

Crossref

NIVA Open Access Archive

NORA - Norwegian Open Research Archives

Negative Statements Considered Useful

Author: Arnaout H.
Razniewski S.
Weikum G.
Publication venue
Publication date: 01/01/2020
Field of study

Knowledge bases (KBs), pragmatic collections of knowledge about notable entities, are an important asset in applications such as search, question answering and dialogue. Rooted in a long tradition in knowledge representation, all popular KBs only store positive information, while they abstain from taking any stance towards statements not contained in them. In this paper, we make the case for explicitly stating interesting statements which are not true. Negative statements would be important to overcome current limitations of question answering, yet due to their potential abundance, any effort towards compiling them needs a tight coupling with ranking. We introduce two approaches towards compiling negative statements. (i) In peer-based statistical inferences, we compare entities with highly related entities in order to derive potential negative statements, which we then rank using supervised and unsupervised features. (ii) In query-log-based text extraction, we use a pattern-based approach for harvesting search engine query logs. Experimental results show that both approaches hold promising and complementary potential. Along with this paper, we publish the first datasets on interesting negative information, containing over 1.1M statements for 100K popular Wikidata entities

MPG.PuRe

Can we predict new facts with open knowledge graph embeddings? A benchmark for open link prediction

Author: Broscheit Samuel
Gashteovski Kiril
Gemulla Rainer
Wang Yanjie
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Open Information Extraction systems extract(“subject text”, “relation text”, “object text”)triples from raw text. Some triples are textualversions of facts, i.e., non-canonicalized men-tions of entities and relations. In this paper, weinvestigate whether it is possible to infernewfacts directly from theopen knowledge graphwithout any canonicalization or any supervi-sion from curated knowledge. For this pur-pose, we propose the open link prediction task,i.e., predicting test facts by completing(“sub-ject text”, “relation text”, ?)questions. Anevaluation in such a setup raises the question ifa correct prediction is actually anewfact thatwas induced by reasoning over the open knowl-edge graph or if it can be trivially explained.For example, facts can appear in different para-phrased textual variants, which can lead to testleakage. To this end, we propose an evaluationprotocol and a methodology for creating theopen link prediction benchmark OLPBENCH.We performed experiments with a prototypicalknowledge graph embedding model for openlink prediction. While the task is very chal-lenging, our results suggests that it is possibleto predict genuinely new facts, which can notbe trivially explained

Crossref

MAnnheim DOCument Server