webLyzard technology gmbh
Not a member yet
102 research outputs found
Sort by
Mining and Leveraging Background Knowledge for Improving Named Entity Linking
Knowledge-rich Information Extraction (IE) methods aspire towards combining classical IE with background knowledge obtained from third-party resources. Linked Open Data repositories that encode billions of machine readable facts from sources such as Wikipedia play a pivotal role in this development.
The recent growth of Linked Data adoption for Information Extraction tasks has shed light on many data quality issues in these data sources that seriously challenge their usefulness such as completeness, timeliness and semantic correctness. Information Extraction methods are, therefore, faced with problems such as name variance and type confusability. If multiple linked data sources are used in parallel, additional concerns regarding link stability and entity mappings emerge.
This paper develops methods for integrating Linked Data into Named Entity Linking methods and addresses challenges in regard to mining knowledge from Linked Data, mitigating data quality issues, and adapting algorithms to leverage this knowledge.
Finally, we apply these methods to Recognyze, a graph-based Named Entity Linking (NEL) system, and provide a comprehensive evaluation which compares its performance to other well-known NEL systems, demonstrating the impact of the suggested methods on its own entity linking performance
Framing Named Entity Linking Error Types
Named Entity Linking (NEL) and relation extraction forms the backbone of Knowledge Base Population tasks. The recent rise of
large open source Knowledge Bases and the continuous focus on improving NEL performance has led to the creation of automated
benchmark solutions during the last decade. The benchmarking of NEL systems offers a valuable approach to understand a NEL
system’s performance quantitatively. However, an in-depth qualitative analysis that helps improving NEL methods by identifying error
causes usually requires a more thorough error analysis. This paper proposes a taxonomy to frame common errors and applies this
taxonomy in a survey study to assess the performance of four well-known Named Entity Linking systems on three recent gold standards.
Keywords: Named Entity Linking, Linked Data Quality, Corpora, Evaluation, Error Analysi
StoryLens: A Multiple Views Corpus for Location and EventDetection
The news media landscape tends to focus on long-running narratives. Correctly processing new information, therefore, requires considering multiple lenses when analyzing media content. Traditionally it would have been considered sufficient to extract the topics or entities contained in a text in order to classify it, but today it is important to also look at more sophisticated annotations related to fine-grained geolocation, events, stories and the relations between them. In order to leverage such lenses we propose a new corpus that offers a diverse set of annotations over texts collected from multiple media sources. We also showcase the framework used for creating the corpus, as well as how the information from the various lenses can be used in order to support different use cases in the EU project InVID for verifying the veracity of online video
On the Importance of Drill-Down Analysis for Assessing Gold Standards and Named Entity Linking Performance
Rigorous evaluations and analyses of evaluation results are key towards improving Named Entity Linking systems. Nevertheless, most current evaluation tools are focused on benchmarking and comparative evaluations. Therefore, they only provide aggregated statistics such as precision, recall and F1-measure to assess system performance and no means for conducting detailed analyses up to the level of individual annotations.
This paper addresses the need for transparent benchmarking and fine-grained error analysis by introducing Orbis, an extensible framework that supports drill-down analysis, multiple annotation tasks and resource versioning. Orbis complements approaches like those deployed through the GERBIL and TAC KBP tools and helps developers to better understand and address shortcomings in their Named Entity Linking tools.
We present three uses cases in order to demonstrate the usefulness of Orbis for both research and production systems: (i)improving Named Entity Linking tools; (ii) detecting gold standard errors; and (iii) performing Named Entity Linking evaluations with multiple versions of the included resources
Daten als Gold der Moderne - Optimierung von Informations- und Entscheidungsprozessen mittels Big Data
uComp Language Quiz - A Game with a Purpose for Multilingual Language Resource Acquisition
This paper presents the uComp Language Quiz, an online application in the tradition of games with a purpose for language resource acquisition. It is based on the human computation framework developed within the uComp research project (www.ucomp.eu), which provides multi-channel deployment and social logins, a viral notification system, quality control mechanisms, and a CrowdFlower data interface to publish game elements as Human Intelligence Tasks
Torpedo: Improving the State-of-the-Art RDF Dataset Slicing
Over the last years, the amount of data published as Linked Data on the Web has grown enormously. In spite of the high availability of Linked Data, organizations still encounter an accessibility challenge while consuming it. This is mostly due to the large size of some of the datasets published as Linked Data. The core observation behind this work is that a subset of these datasets suffices to address the needs of most organizations. In this paper, we introduce Torpedo, an approach for efficiently selecting and extracting relevant subsets from RDF datasets. In particular, Torpedo adds optimization techniques to reduce seek operations costs as well as the support of multi-join graph patterns and SPARQL FILTERs that enable to perform a more granular data selection. We compare the performance of our approach with existing solutions on nine different queries against four datasets. Our results show that our approach is highly scalable and is up to 26% faster than the current state-of-the-art RDF dataset slicing approach
Mitigating linked data quality issues in knowledge-intense information extraction methods
Advances in research areas such as named entity linking and sentiment analysis have triggered the emergence of knowledge-intensive information extraction methods that combine classical information extraction with background knowledge from the Web. Despite data quality concerns, linked data sources such as DBpedia, GeoNames and Wikidata which encode facts in a standardized structured format are particularly attractive for such applications. This paper addresses the problem of data quality by introducing a framework that elaborates on linked data quality issues relevant to different stages of the background knowledge acquisition process, their impact on information extraction performance and applicable mitigation strategies. Applying this framework to named entity linking and data enrichment demonstrates the potential of the introduced mitigation strategies to lessen the impact of different kinds of data quality problems. An industrial use case that aims at the automatic generation of image metadata from image descriptions illustrates the successful deployment of knowledge-intensive information extraction in real-world applications and constraints introduced by data quality concerns
Aspect-Based Extraction and Analysis of Affective Knowledge from Social Media Streams
Extracting and analyzing affective knowledge from social media in a structured manner is a challenging task. Decision makers require insights into the public perception of a company's products and services, as a strategic feedback channel to guide communication campaigns, and as an early warning system to quickly react in the case of unforeseen events. The approach presented in this paper goes beyond bipolar metrics of sentiment. It combines factual and affective knowledge extracted from rich public knowledge bases to analyze emotions expressed towards specific entities (targets) in social media. We obtain common and common-sense domain knowledge from DBpedia and ConceptNet to identify potential sentiment targets. We employ affective knowledge about emotional categories available from SenticNet to assess how those targets and their aspects (e.g. specific product features) are perceived in social media. An evaluation shows the usefulness and correctness of the extracted domain knowledge, which is used in a proof-of-concept data analytics application to investigate the perception of car brands on social media in the period between September and November 2015
Semantic Systems and Visual Tools to Support Environmental Communication
Given the intense attention that environmental topics such as climate change attract in news and social media coverage, scientists and communication professionals want to know how different stakeholders perceive observable threats and policy options, how specific media channels react to new insights, and how journalists present scientific knowledge to the public. This paper investigates the potential of semantic technologies to address these questions. After summarizing methods to extract and disambiguate context information, we present visualization techniques to explore the lexical, geospatial, and relational context of topics and entities referenced in these repositories. The examples stem from the Media Watch on Climate Change, the Climate Resilience Toolkit and the NOAA Media Watch—three applications that aggregate environmental resources from a wide range of online sources. These systems not only show the value of providing comprehensive information to the public, but also have helped to develop a novel communication success metric that goes beyond bipolar assessments of sentiment