Search CORE

143,423 research outputs found

Assessing the Lexico-Semantic Relational Knowledge Captured by Word and Concept Embeddings

Author: Denaux Ronald
Denaux Ronald
Faruqui Manaal
Kata Gá
Pennington Jeffrey
Riedel Sebastian
Roller Stephen
Shazeer Noam
Shi Baoxu
Publication venue
Publication date: 24/09/2019
Field of study

Deep learning currently dominates the benchmarks for various NLP tasks and, at the basis of such systems, words are frequently represented as embeddings --vectors in a low dimensional space-- learned from large text corpora and various algorithms have been proposed to learn both word and concept embeddings. One of the claimed benefits of such embeddings is that they capture knowledge about semantic relations. Such embeddings are most often evaluated through tasks such as predicting human-rated similarity and analogy which only test a few, often ill-defined, relations. In this paper, we propose a method for (i) reliably generating word and concept pair datasets for a wide number of relations by using a knowledge graph and (ii) evaluating to what extent pre-trained embeddings capture those relations. We evaluate the approach against a proprietary and a public knowledge graph and analyze the results, showing which lexico-semantic relational knowledge is captured by current embedding learning approaches.Comment: Accepted at the 10th International Conference on Knowledge Capture (K-CAP 2019

arXiv.org e-Print Archive

Crossref

Scalable Cross-lingual Document Similarity through Language-specific Concept Hierarchies

Author: Badenes-Olmedo Carlos
Blei David M
Boyd-Graber Jordan
Hakkani-Tur D
Hearst Marti
Kenter Tom
Luo Wenhan
Pritchard Jonathan K.
Rao C Radhakrishna
Towne W Ben
Wang Chong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/12/2020
Field of study

With the ongoing growth in number of digital articles in a wider set of languages and the expanding use of different languages, we need annotation methods that enable browsing multi-lingual corpora. Multilingual probabilistic topic models have recently emerged as a group of semi-supervised machine learning models that can be used to perform thematic explorations on collections of texts in multiple languages. However, these approaches require theme-aligned training data to create a language-independent space. This constraint limits the amount of scenarios that this technique can offer solutions to train and makes it difficult to scale up to situations where a huge collection of multi-lingual documents are required during the training phase. This paper presents an unsupervised document similarity algorithm that does not require parallel or comparable corpora, or any other type of translation resource. The algorithm annotates topics automatically created from documents in a single language with cross-lingual labels and describes documents by hierarchies of multi-lingual concepts from independently-trained models. Experiments performed on the English, Spanish and French editions of JCR-Acquis corpora reveal promising results on classifying and sorting documents by similar content.Comment: Accepted at the 10th International Conference on Knowledge Capture (K-CAP 2019

arXiv.org e-Print Archive

Crossref

MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach

Author: Bryl Volha
Brümmer Martin
Consoli Sergio
Cucerzan Silviu
Devi Pooja
Erp Marieke Van
Ferreira Thiago Castro
Hoffart Johannes
Juan
Luo Gang
Nuzzolese Andrea-Giovanni
Röder Michael
Steinmetz Nadine
van Erp Marieke
Zhang Lei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/10/2017
Field of study

Entity linking has recently been the subject of a significant body of research. Currently, the best performing approaches rely on trained mono-lingual models. Porting these approaches to other languages is consequently a difficult endeavor as it requires corresponding training data and retraining of the models. We address this drawback by presenting a novel multilingual, knowledge-based agnostic and deterministic approach to entity linking, dubbed MAG. MAG is based on a combination of context-based retrieval on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data sets and in 7 languages. Our results show that the best approach trained on English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse on datasets in other languages. MAG, on the other hand, achieves state-of-the-art performance on English datasets and reaches a micro F-measure that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc

arXiv.org e-Print Archive

Crossref

Learning semantic sentence representations from visually grounded language without lexical knowledge

Author: Frank Stefan
Merkx Danny
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2019
Field of study

Current approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves results comparable to the current state-of-the-art on two popular image-caption retrieval benchmark data sets: MSCOCO and Flickr8k. We evaluate the semantic content of the resulting sentence embeddings using the data from the Semantic Textual Similarity benchmark task and show that the multimodal embeddings correlate well with human semantic similarity judgements. The system achieves state-of-the-art results on several of these benchmarks, which shows that a system trained solely on multimodal data, without assuming any word representations, is able to capture sentence level semantics. Importantly, this result shows that we do not need prior knowledge of lexical level semantics in order to model sentence level semantics. These findings demonstrate the importance of visual information in semantics

arXiv.org e-Print Archive

Radboud Repository

MPG.PuRe

The application of ubiquitous multimodal synchronous data capture in CAD

Author: Kosmadoudi Zoe
Lim Theodore
Liu Ying
Ritchie James Millar
Sivanathan Aparajithan
Sung Raymond
Publication venue: 'Elsevier BV'
Publication date: 11/10/2013
Field of study

Heriot Watt Pure

Principles in Patterns (PiP) : Project Evaluation Synthesis

Author: Macgregor George
Publication venue: University of Strathclyde
Publication date: 01/07/2012
Field of study

Evaluation activity found the technology-supported approach to curriculum design and approval developed by PiP to demonstrate high levels of user acceptance, promote improvements to the quality of curriculum designs, render more transparent and efficient aspects of the curriculum approval and quality monitoring process, demonstrate process efficacy and resolve a number of chronic information management difficulties which pervaded the previous state. The creation of a central repository of curriculum designs as the basis for their management as "knowledge assets", thus facilitating re-use and sharing of designs and exposure of tacit curriculum design practice, was also found to be highly advantageous. However, further process improvements remain possible and evidence of system resistance was found in some stakeholder groups. Recommendations arising from the findings and conclusions include the need to improve data collection surrounding the curriculum approval process so that the process and human impact of C-CAP can be monitored and observed. Strategies for improving C-CAP acceptance among the "late majority", the need for C-CAP best practice guidance, and suggested protocols on the knowledge management of curriculum designs are proposed. Opportunities for further process improvements in institutional curriculum approval, including a re-engineering of post-faculty approval processes, are also recommended

University of Strathclyde Institutional Repository

A methodology for the capture and analysis of hybrid data: a case study of program debugging

Author: A. F. Blackwell
A. R. Jansen
B. Boulay du
Benedict du Boulay
D. Bergantz
D. J. Gilmore
D. J. Gilmore
D. Kranzlmüller
D. T. Hountalas
F. Détienne
F. Gabbay
G. Friedrich
I. Vessey
I. Vessey
K. Stenning
M. Crosby
M. J. Patel
M. T. H. Chi
N. K. Denzin
N. Pennington
N. Pennington
P. Mulholland
P. Romero
P. Romero
P. Romero
Pablo Romero
R. Cox
R. Cox
R. Cox
R. Marzi
Richard Cox
Rudi Lutz
S. Ainsworth
S. P. Davies
S. P. Robertson
Sallyann Bryant
T. Jong de
Y. Papadopoulos
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2007
Field of study

No description supplie

Crossref

Sussex Research Online