143,423 research outputs found
Assessing the Lexico-Semantic Relational Knowledge Captured by Word and Concept Embeddings
Deep learning currently dominates the benchmarks for various NLP tasks and,
at the basis of such systems, words are frequently represented as embeddings
--vectors in a low dimensional space-- learned from large text corpora and
various algorithms have been proposed to learn both word and concept
embeddings. One of the claimed benefits of such embeddings is that they capture
knowledge about semantic relations. Such embeddings are most often evaluated
through tasks such as predicting human-rated similarity and analogy which only
test a few, often ill-defined, relations. In this paper, we propose a method
for (i) reliably generating word and concept pair datasets for a wide number of
relations by using a knowledge graph and (ii) evaluating to what extent
pre-trained embeddings capture those relations. We evaluate the approach
against a proprietary and a public knowledge graph and analyze the results,
showing which lexico-semantic relational knowledge is captured by current
embedding learning approaches.Comment: Accepted at the 10th International Conference on Knowledge Capture
(K-CAP 2019
Scalable Cross-lingual Document Similarity through Language-specific Concept Hierarchies
With the ongoing growth in number of digital articles in a wider set of
languages and the expanding use of different languages, we need annotation
methods that enable browsing multi-lingual corpora. Multilingual probabilistic
topic models have recently emerged as a group of semi-supervised machine
learning models that can be used to perform thematic explorations on
collections of texts in multiple languages. However, these approaches require
theme-aligned training data to create a language-independent space. This
constraint limits the amount of scenarios that this technique can offer
solutions to train and makes it difficult to scale up to situations where a
huge collection of multi-lingual documents are required during the training
phase. This paper presents an unsupervised document similarity algorithm that
does not require parallel or comparable corpora, or any other type of
translation resource. The algorithm annotates topics automatically created from
documents in a single language with cross-lingual labels and describes
documents by hierarchies of multi-lingual concepts from independently-trained
models. Experiments performed on the English, Spanish and French editions of
JCR-Acquis corpora reveal promising results on classifying and sorting
documents by similar content.Comment: Accepted at the 10th International Conference on Knowledge Capture
(K-CAP 2019
MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach
Entity linking has recently been the subject of a significant body of
research. Currently, the best performing approaches rely on trained
mono-lingual models. Porting these approaches to other languages is
consequently a difficult endeavor as it requires corresponding training data
and retraining of the models. We address this drawback by presenting a novel
multilingual, knowledge-based agnostic and deterministic approach to entity
linking, dubbed MAG. MAG is based on a combination of context-based retrieval
on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data
sets and in 7 languages. Our results show that the best approach trained on
English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse
on datasets in other languages. MAG, on the other hand, achieves
state-of-the-art performance on English datasets and reaches a micro F-measure
that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc
Learning semantic sentence representations from visually grounded language without lexical knowledge
Current approaches to learning semantic representations of sentences often
use prior word-level knowledge. The current study aims to leverage visual
information in order to capture sentence level semantics without the need for
word embeddings. We use a multimodal sentence encoder trained on a corpus of
images with matching text captions to produce visually grounded sentence
embeddings. Deep Neural Networks are trained to map the two modalities to a
common embedding space such that for an image the corresponding caption can be
retrieved and vice versa. We show that our model achieves results comparable to
the current state-of-the-art on two popular image-caption retrieval benchmark
data sets: MSCOCO and Flickr8k. We evaluate the semantic content of the
resulting sentence embeddings using the data from the Semantic Textual
Similarity benchmark task and show that the multimodal embeddings correlate
well with human semantic similarity judgements. The system achieves
state-of-the-art results on several of these benchmarks, which shows that a
system trained solely on multimodal data, without assuming any word
representations, is able to capture sentence level semantics. Importantly, this
result shows that we do not need prior knowledge of lexical level semantics in
order to model sentence level semantics. These findings demonstrate the
importance of visual information in semantics
Principles in Patterns (PiP) : Project Evaluation Synthesis
Evaluation activity found the technology-supported approach to curriculum design and approval developed by PiP to demonstrate high levels of user acceptance, promote improvements to the quality of curriculum designs, render more transparent and efficient aspects of the curriculum approval and quality monitoring process, demonstrate process efficacy and resolve a number of chronic information management difficulties which pervaded the previous state. The creation of a central repository of curriculum designs as the basis for their management as "knowledge assets", thus facilitating re-use and sharing of designs and exposure of tacit curriculum design practice, was also found to be highly advantageous. However, further process improvements remain possible and evidence of system resistance was found in some stakeholder groups. Recommendations arising from the findings and conclusions include the need to improve data collection surrounding the curriculum approval process so that the process and human impact of C-CAP can be monitored and observed. Strategies for improving C-CAP acceptance among the "late majority", the need for C-CAP best practice guidance, and suggested protocols on the knowledge management of curriculum designs are proposed. Opportunities for further process improvements in institutional curriculum approval, including a re-engineering of post-faculty approval processes, are also recommended
A methodology for the capture and analysis of hybrid data: a case study of program debugging
No description supplie
- …