52 research outputs found
KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition
KnowNER is a multilingual Named Entity Recognition (NER) system that
leverages different degrees of external knowledge. A novel modular framework
divides the knowledge into four categories according to the depth of knowledge
they convey. Each category consists of a set of features automatically
generated from different information sources (such as a knowledge-base, a list
of names or document-specific semantic annotations) and is used to train a
conditional random field (CRF). Since those information sources are usually
multilingual, KnowNER can be easily trained for a wide range of languages. In
this paper, we show that the incorporation of deeper knowledge systematically
boosts accuracy and compare KnowNER with state-of-the-art NER approaches across
three languages (i.e., English, German and Spanish) performing amongst
state-of-the art systems in all of them
Rethinking Economic Energy Policy Research – Developing Qualitative Scenarios to Identify Feasible Energy Policies
To accelerate deep decarbonisation in the energy sector, the discipline of economics should focus on identifying feasible instead of optimal policies. To do so, economic analysis should include four features: complexity (a), non-economic aspects (b),uncertainty (c) and stakeholders (d). The aim of this paper is to show that qualitative scenario analysis represents a promising alternative to conventional optimisation approaches and meets these requirements. This paper develops qualitative scenarios for the case study of gas infrastructure modifications with hydrogen and carbon capture and storage technologies in Germany. In the results, the six socio-economic qualitative scenarios are described in more detail. A comparison between the case study and a conventional approach reveals three limitations of the latter and highlights the value of qualitative scenario development. The authors distil the advantages of qualitative scenario analysis and discuss challenges and chances, that go beyond the case study.
In conclusion, developing socio-economic scenarios has a large potential to improve economic policy assessment. It also allows to catch up with the rethinking of energy research taking place in other disciplines
Same but Different: Distant Supervision for Predicting and Understanding Entity Linking Difficulty
Entity Linking (EL) is the task of automatically identifying entity mentions
in a piece of text and resolving them to a corresponding entity in a reference
knowledge base like Wikipedia. There is a large number of EL tools available
for different types of documents and domains, yet EL remains a challenging task
where the lack of precision on particularly ambiguous mentions often spoils the
usefulness of automated disambiguation results in real applications. A priori
approximations of the difficulty to link a particular entity mention can
facilitate flagging of critical cases as part of semi-automated EL systems,
while detecting latent factors that affect the EL performance, like
corpus-specific features, can provide insights on how to improve a system based
on the special characteristics of the underlying corpus. In this paper, we
first introduce a consensus-based method to generate difficulty labels for
entity mentions on arbitrary corpora. The difficulty labels are then exploited
as training data for a supervised classification task able to predict the EL
difficulty of entity mentions using a variety of features. Experiments over a
corpus of news articles show that EL difficulty can be estimated with high
accuracy, revealing also latent features that affect EL performance. Finally,
evaluation results demonstrate the effectiveness of the proposed method to
inform semi-automated EL pipelines.Comment: Preprint of paper accepted for publication in the 34th ACM/SIGAPP
Symposium On Applied Computing (SAC 2019
Evaluating the Impact of Knowledge Graph Context on Entity Disambiguation Models
Pretrained Transformer models have emerged as state-of-the-art approaches
that learn contextual information from text to improve the performance of
several NLP tasks. These models, albeit powerful, still require specialized
knowledge in specific scenarios. In this paper, we argue that context derived
from a knowledge graph (in our case: Wikidata) provides enough signals to
inform pretrained transformer models and improve their performance for named
entity disambiguation (NED) on Wikidata KG. We further hypothesize that our
proposed KG context can be standardized for Wikipedia, and we evaluate the
impact of KG context on state-of-the-art NED model for the Wikipedia knowledge
base. Our empirical results validate that the proposed KG context can be
generalized (for Wikipedia), and providing KG context in transformer
architectures considerably outperforms the existing baselines, including the
vanilla transformer models.Comment: to appear in proceedings of CIKM 202
MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach
Entity linking has recently been the subject of a significant body of
research. Currently, the best performing approaches rely on trained
mono-lingual models. Porting these approaches to other languages is
consequently a difficult endeavor as it requires corresponding training data
and retraining of the models. We address this drawback by presenting a novel
multilingual, knowledge-based agnostic and deterministic approach to entity
linking, dubbed MAG. MAG is based on a combination of context-based retrieval
on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data
sets and in 7 languages. Our results show that the best approach trained on
English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse
on datasets in other languages. MAG, on the other hand, achieves
state-of-the-art performance on English datasets and reaches a micro F-measure
that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc
Multi-task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs
Many popular knowledge graphs such as Freebase, YAGO or DBPedia maintain a
list of non-discrete attributes for each entity. Intuitively, these attributes
such as height, price or population count are able to richly characterize
entities in knowledge graphs. This additional source of information may help to
alleviate the inherent sparsity and incompleteness problem that are prevalent
in knowledge graphs. Unfortunately, many state-of-the-art relational learning
models ignore this information due to the challenging nature of dealing with
non-discrete data types in the inherently binary-natured knowledge graphs. In
this paper, we propose a novel multi-task neural network approach for both
encoding and prediction of non-discrete attribute information in a relational
setting. Specifically, we train a neural network for triplet prediction along
with a separate network for attribute value regression. Via multi-task
learning, we are able to learn representations of entities, relations and
attributes that encode information about both tasks. Moreover, such attributes
are not only central to many predictive tasks as an information source but also
as a prediction target. Therefore, models that are able to encode, incorporate
and predict such information in a relational learning context are highly
attractive as well. We show that our approach outperforms many state-of-the-art
methods for the tasks of relational triplet classification and attribute value
prediction.Comment: Accepted at CIKM 201
RECON: Relation Extraction using Knowledge Graph Context in a Graph Neural Network
In this paper, we present a novel method named RECON, that automatically
identifies relations in a sentence (sentential relation extraction) and aligns
to a knowledge graph (KG). RECON uses a graph neural network to learn
representations of both the sentence as well as facts stored in a KG, improving
the overall extraction quality. These facts, including entity attributes
(label, alias, description, instance-of) and factual triples, have not been
collectively used in the state of the art methods. We evaluate the effect of
various forms of representing the KG context on the performance of RECON. The
empirical evaluation on two standard relation extraction datasets shows that
RECON significantly outperforms all state of the art methods on NYT Freebase
and Wikidata datasets. RECON reports 87.23 F1 score (Vs 82.29 baseline) on
Wikidata dataset whereas on NYT Freebase, reported values are 87.5(P@10) and
74.1(P@30) compared to the previous baseline scores of 81.3(P@10) and
63.1(P@30).Comment: The Web Conference 2021 (WWW'21) full pape
Large Process Models: Business Process Management in the Age of Generative AI
The continued success of Large Language Models (LLMs) and other generative
artificial intelligence approaches highlights the advantages that large
information corpora can have over rigidly defined symbolic models, but also
serves as a proof-point of the challenges that purely statistics-based
approaches have in terms of safety and trustworthiness. As a framework for
contextualizing the potential, as well as the limitations of LLMs and other
foundation model-based technologies, we propose the concept of a Large Process
Model (LPM) that combines the correlation power of LLMs with the analytical
precision and reliability of knowledge-based systems and automated reasoning
approaches. LPMs are envisioned to directly utilize the wealth of process
management experience that experts have accumulated, as well as process
performance data of organizations with diverse characteristics, e.g., regarding
size, region, or industry. In this vision, the proposed LPM would allow
organizations to receive context-specific (tailored) process and other business
models, analytical deep-dives, and improvement recommendations. As such, they
would allow to substantially decrease the time and effort required for business
transformation, while also allowing for deeper, more impactful, and more
actionable insights than previously possible. We argue that implementing an LPM
is feasible, but also highlight limitations and research challenges that need
to be solved to implement particular aspects of the LPM vision
Mining and Leveraging Background Knowledge for Improving Named Entity Linking
Knowledge-rich Information Extraction (IE) methods aspire towards combining classical IE with background knowledge obtained from third-party resources. Linked Open Data repositories that encode billions of machine readable facts from sources such as Wikipedia play a pivotal role in this development.
The recent growth of Linked Data adoption for Information Extraction tasks has shed light on many data quality issues in these data sources that seriously challenge their usefulness such as completeness, timeliness and semantic correctness. Information Extraction methods are, therefore, faced with problems such as name variance and type confusability. If multiple linked data sources are used in parallel, additional concerns regarding link stability and entity mappings emerge.
This paper develops methods for integrating Linked Data into Named Entity Linking methods and addresses challenges in regard to mining knowledge from Linked Data, mitigating data quality issues, and adapting algorithms to leverage this knowledge.
Finally, we apply these methods to Recognyze, a graph-based Named Entity Linking (NEL) system, and provide a comprehensive evaluation which compares its performance to other well-known NEL systems, demonstrating the impact of the suggested methods on its own entity linking performance
- …