Search CORE

28 research outputs found

CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

Author: Jain Prince
Talukdar Partha
Vashishth Shikhar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/01/2019
Field of study

Open Information Extraction (OpenIE) methods extract (noun phrase, relation phrase, noun phrase) triples from text, resulting in the construction of large Open Knowledge Bases (Open KBs). The noun phrases (NPs) and relation phrases in such Open KBs are not canonicalized, leading to the storage of redundant and ambiguous facts. Recent research has posed canonicalization of Open KBs as clustering over manuallydefined feature spaces. Manual feature engineering is expensive and often sub-optimal. In order to overcome this challenge, we propose Canonicalization using Embeddings and Side Information (CESI) - a novel approach which performs canonicalization over learned embeddings of Open KBs. CESI extends recent advances in KB embedding by incorporating relevant NP and relation phrase side information in a principled manner. Through extensive experiments on multiple real-world datasets, we demonstrate CESI's effectiveness.Comment: Accepted at WWW 201

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications

Classical Copying versus Quantum Entanglement in Natural Language: The Case of VP-ellipsis

Author: Sadrzadeh M
Wijnholds G
Publication venue: 'Open Publishing Association'
Publication date: 08/11/2018
Field of study

In Proceedings CAPNS 2018, arXiv:1811.02701In Proceedings CAPNS 2018, arXiv:1811.0270

arXiv.org e-Print Archive

Queen Mary Research Online

DISCOver: DIStributional approach based on syntactic dependencies for discovering COnstructions

Author: Kovatchev Venelin
Martí Antonin M. Antònia
Salamó Llorente Maria
Taulé Delor Mariona
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 21/10/2020
Field of study

One of the goals in Cognitive Linguistics is the automatic identification and analysis of constructions, since they are fundamental linguistic units for understanding language. This article presents DISCOver, an unsupervised methodology for the automatic discovery of lexico-syntactic patterns that can be considered as candidates for constructions. This methodology follows a distributional semantic approach. Concretely, it is based on our proposed pattern-construction hypothesis: those contexts that are relevant to the definition of a cluster of semantically related words tend to be (part of) lexico-syntactic constructions. Our proposal uses Distributional Semantic Models for modelling the context taking into account syntactic dependencies. After a clustering process, we linked all those clusters with strong relationships and we use them as a source of information for deriving lexico-syntactic patterns, obtaining a total number of 220,732 candidates from a 100 million token corpus of Spanish. We evaluated the patterns obtained intrinsically, applying statistical association measures and they were also evaluated qualitatively by experts. Our results were superior to the baseline in both quality and quantity in all cases. While our experiments have been carried out using a Spanish corpus, this methodology is language independent and only requires a large corpus annotated with the parts of speech and dependencies to be applied

Diposit Digital de la Universitat de Barcelona