9 research outputs found
On the Use of Parsing for Named Entity Recognition
[Abstract] Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic information; and we propose a new way of using parsing in NER based on casting parsing itself as a sequence labeling task.Xunta de Galicia; ED431C 2020/11Xunta de Galicia; ED431G 2019/01This work has been funded by MINECO, AEI and FEDER of UE through the ANSWER-ASAP project (TIN2017-85160-C2-1-R); and by Xunta de Galicia through a Competitive Reference Group grant (ED431C 2020/11). CITIC, as Research Center of the Galician University System, is funded by the ConsellerĂa de EducaciĂłn, Universidade e FormaciĂłn Profesional of the Xunta de Galicia through the European Regional Development Fund (ERDF/FEDER) with 80%, the Galicia ERDF 2014-20 Operational Programme, and the remaining 20% from the SecretarĂa Xeral de Universidades (Ref. ED431G 2019/01). Carlos GĂłmez-RodrĂguez has also received funding from the European Research Council (ERC), under the European Unionâs Horizon 2020 research and innovation programme (FASTPARSE, Grant No. 714150)
Bidirectional End-to-End Learning of Retriever-Reader Paradigm for Entity Linking
Entity Linking (EL) is a fundamental task for Information Extraction and
Knowledge Graphs. The general form of EL (i.e., end-to-end EL) aims to first
find mentions in the given input document and then link the mentions to
corresponding entities in a specific knowledge base. Recently, the paradigm of
retriever-reader promotes the progress of end-to-end EL, benefiting from the
advantages of dense entity retrieval and machine reading comprehension.
However, the existing study only trains the retriever and the reader separately
in a pipeline manner, which ignores the benefit that the interaction between
the retriever and the reader can bring to the task. To advance the
retriever-reader paradigm to perform more perfectly on end-to-end EL, we
propose BEER, a Bidirectional End-to-End training framework for Retriever
and Reader. Through our designed bidirectional end-to-end training, BEER
guides the retriever and the reader to learn from each other, make progress
together, and ultimately improve EL performance. Extensive experiments on
benchmarks of multiple domains demonstrate the effectiveness of our proposed
BEER.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark
Modern Entity Linking (EL) systems entrench a popularity bias, yet there is
no dataset focusing on tail and emerging entities in languages other than
English. We present Hansel, a new benchmark in Chinese that fills the vacancy
of non-English few-shot and zero-shot EL challenges. The test set of Hansel is
human annotated and reviewed, created with a novel method for collecting
zero-shot EL datasets. It covers 10K diverse documents in news, social media
posts and other web articles, with Wikidata as its target Knowledge Base. We
demonstrate that the existing state-of-the-art EL system performs poorly on
Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that
scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We
also show that our baseline achieves competitive results on TAC-KBP2015 Chinese
Entity Linking task.Comment: WSDM 202
Three essays on environmental economics
Thesis(Doctoral) --KDI School:Ph.D in Development Policy,2019Due to its adverse health effects, particulate matter (PM) pollution has become a critical public policy issue in Northeast Asia. As concerns about PM pollution rise, so does interest in identifying its origins, such as transboundary pollutant sources. Employing daily average PM10 concentration level data from Beijing, Shanghai and Seoul during 2014-2016, we estimate the direction and extent of the spillover effect of PM10 density between China and Korea. Estimation outcomes suggest that PM10 density levels in Beijing and Shanghai are Granger causes for PM density in Seoul, but not the other way around. PM 10 density in Seoul is increased by 0.13 ppm and 0.133 ppm in response to one ppm increase in PM10 density in Beijing and Shanghai on the previous day, respectively. This cross-border spillover effect from Beijing is reduced by 0.076 ppm from May to October, when the air flow makes it difficult for PM10 sources generated in Beijing to reach Seoul.Chapter 1. THE CROSS-BORDER SPILLOVER EFFECT OF PARTICULATE MATTER POLLUTION IN KOREA
Chapter 2. FACTORS TO ENHAANCE COMPLIANCE WITH ETS IN KOREA BASED ON COMPANY LEVEL DATA
Chapter 3. SUSTAINABLE MANAGEMENT OF CARBON SEQUESTRATION SERVICE IN AREAS WITH HIGH DEVELOPMENT PRESSURE: CONSIDERING LAND USE CHANGES AND CARBON COSTSdoctoralpublishedHyemin PARK
The Case of Wikidata
Since its launch in 2012, Wikidata has grown to become the largest open knowledge
base (KB), containing more than 100 million data items and over 6 million registered
users. Wikidata serves as the structured data backbone of Wikipedia, addressing
data inconsistencies, and adhering to the motto of âserving anyone anywhere in
the world,â a vision realized through the diversity of knowledge. Despite being
a collaboratively contributed platform, the Wikidata community heavily relies on
bots, automated accounts with batch, and speedy editing rights, for a majority of
edits. As Wikidata approaches its first decade, the question arises: How close is
Wikidata to achieving its vision of becoming a global KB and how diverse is it in
serving the global population? This dissertation investigates the current status of
Wikidataâs diversity, the role of bot interventions on diversity, and how bots can be
leveraged to improve diversity within the context of Wikidata.
The methodologies used in this study are mapping study and content analysis, which
led to the development of three datasets: 1) Wikidata Research Articles Dataset,
covering the literature on Wikidata from its first decade of existence sourced from
online databases to inspect its current status; 2) Wikidata Requests-for-Permissions
Dataset, based on the pages requesting bot rights on the Wikidata website to explore
bots from a community perspective; and 3) Wikidata Revision History Dataset,
compiled from the edit history of Wikidata to investigate bot editing behavior and
its impact on diversity, all of which are freely available online.
The insights gained from the mapping study reveal the growing popularity of Wikidata
in the research community and its various application areas, indicative of its
progress toward the ultimate goal of reaching the global community. However, there
is currently no research addressing the topic of diversity in Wikidata, which could
shed light on its capacity to serve a diverse global population. To address this gap,
this dissertation proposes a diversity measurement concept that defines diversity in
a KB context in terms of variety, balance, and disparity and is capable of assessing
diversity in a KB from two main angles: user and data. The application of this concept
on the domains and classes of the Wikidata Revision History Dataset exposes
imbalanced content distribution across Wikidata domains, which indicates low data
diversity in Wikidata domains.
Further analysis discloses that bots have been active since the inception of Wikidata,
and the community embraces their involvement in content editing tasks, often
importing data from Wikipedia, which shows a low diversity of sources in bot edits.
Bots and human users engage in similar editing tasks but exhibit distinct editing patterns.
The findings of this thesis confirm that bots possess the potential to influence
diversity within Wikidata by contributing substantial amounts of data to specific
classes and domains, leading to an imbalance. However, this potential can also be
harnessed to enhance coverage in classes with limited content and restore balance,
thus improving diversity. Hence, this study proposes to enhance diversity through
automation and demonstrate the practical implementation of the recommendations
using a specific use case.
In essence, this research enhances our understanding of diversity in relation to a KB,
elucidates the influence of automation on data diversity, and sheds light on diversity
improvement within a KB context through the usage of automation.Seit seiner EinfuÌhrung im Jahr 2012 hat sich Wikidata zu der gröĂten offenen Wissensdatenbank
entwickelt, die mehr als 100 Millionen Datenelemente und uÌber 6
Millionen registrierte Benutzer enthĂ€lt. Wikidata dient als das strukturierte RuÌckgrat
von Wikipedia, indem es Datenunstimmigkeiten angeht und sich dem Motto
verschrieben hat, âjedem uÌberall auf der Welt zu dienenâ, eine Vision, die durch die
DiversitÀt des Wissens verwirklicht wird. Trotz seiner kooperativen Natur ist die
Wikidata-Community in hohem MaĂe auf Bots, automatisierte Konten mit Batch-
Verarbeitung und schnelle Bearbeitungsrechte angewiesen, um die Mehrheit der
Bearbeitungen durchzufuÌhren.
Da Wikidata seinem ersten Jahrzehnt entgegengeht, stellt sich die Frage: Wie nahe
ist Wikidata daran, seine Vision, eine globale Wissensdatenbank zu werden, zu verwirklichen,
und wie ausgeprĂ€gt ist seine Dienstleistung fuÌr die globale Bevölkerung?
Diese Dissertation untersucht den aktuellen Status der DiversitÀt von Wikidata,
die Rolle von Bot-Eingriffen in Bezug auf DiversitÀt und wie Bots im Kontext von
Wikidata zur Verbesserung der DiversitÀt genutzt werden können.
Die in dieser Studie verwendeten Methoden sind Mapping-Studie und Inhaltsanalyse,
die zur Entwicklung von drei DatensĂ€tzen gefuÌhrt haben: 1) Wikidata Research
Articles Dataset, die die Literatur zu Wikidata aus dem ersten Jahrzehnt aus
Online-Datenbanken umfasst, um den aktuellen Stand zu untersuchen; 2) Requestfor-
Permission Dataset, der auf den Seiten zur Beantragung von Bot-Rechten auf
der Wikidata-Website basiert, um Bots aus der Perspektive der Gemeinschaft zu
untersuchen; und 3)Wikidata Revision History Dataset, der aus der Bearbeitungshistorie
von Wikidata zusammengestellt wurde, um das Bearbeitungsverhalten von
Bots zu untersuchen und dessen Auswirkungen auf die DiversitÀt, die alle online frei
verfuÌgbar sind.
Die Erkenntnisse aus der Mapping-Studie zeigen die wachsende Beliebtheit von Wikidata
in der Forschungsgemeinschaft und in verschiedenen Anwendungsbereichen,
was auf seinen Fortschritt hin zur letztendlichen Zielsetzung hindeutet, die globale
Gemeinschaft zu erreichen. Es gibt jedoch derzeit keine Forschung, die sich mit
dem Thema der DiversitÀt in Wikidata befasst und Licht auf seine FÀhigkeit werfen
könnte, eine vielfĂ€ltige globale Bevölkerung zu bedienen. Um diese LuÌcke zu
schlieĂen, schlĂ€gt diese Dissertation ein Konzept zur Messung der DiversitĂ€t vor,
das die DiversitÀt im Kontext einer Wissensbasis anhand von Vielfalt, Balance und
Diskrepanz definiert und in der Lage ist, die DiversitÀt aus zwei Hauptperspektiven
zu bewerten: Benutzer und Daten.
Die Anwendung dieses Konzepts auf die Bereiche und Klassen des Wikidata Revision
History Dataset zeigt eine unausgewogene Verteilung des Inhalts uÌber die Bereiche
von Wikidata auf, was auf eine geringe DiversitÀt der Daten in den Bereichen von
Wikidata hinweist.
Weitere Analysen zeigen, dass Bots seit der GruÌndung von Wikidata aktiv waren
und von der Gemeinschaft inhaltliche Bearbeitungsaufgaben angenommen werden,
oft mit Datenimporten aus Wikipedia, was auf eine geringe DiversitÀt der Quellen
bei Bot-Bearbeitungen hinweist. Bots und menschliche Benutzer fuÌhren Ă€hnliche
Bearbeitungsaufgaben aus, zeigen jedoch unterschiedliche Bearbeitungsmuster. Die
Ergebnisse dieser Dissertation bestÀtigen, dass Bots das Potenzial haben, die DiversitÀt in Wikidata zu beeinflussen, indem sie bedeutende Datenmengen zu bestimmten
Klassen und Bereichen beitragen, was zu einer Ungleichgewichtung fuÌhrt.
Dieses Potenzial kann jedoch auch genutzt werden, um die Abdeckung in Klassen
mit begrenztem Inhalt zu verbessern und das Gleichgewicht wiederherzustellen, um
die DiversitÀt zu verbessern. Daher schlÀgt diese Studie vor, die DiversitÀt durch
Automatisierung zu verbessern und die praktische Umsetzung der Empfehlungen
anhand eines spezifischen Anwendungsfalls zu demonstrieren.
Kurz gesagt trÀgt diese Forschung dazu bei, unser VerstÀndnis der DiversitÀt im
Kontext einer Wissensbasis zu vertiefen, wirft Licht auf den Einfluss von Automatisierung
auf die DiversitÀt von Daten und zeigt die Verbesserung der DiversitÀt im
Kontext einer Wissensbasis durch die Verwendung von Automatisierung auf
Deep learning based semantic textual similarity for applications in translation technology
A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Semantic Textual Similarity (STS) measures the equivalence of meanings
between two textual segments. It is a fundamental task for many natural
language processing applications. In this study, we focus on employing STS in
the context of translation technology. We start by developing models to estimate
STS. We propose a new unsupervised vector aggregation-based STS method
which relies on contextual word embeddings. We also propose a novel Siamese
neural network based on efficient recurrent neural network units. We empirically
evaluate various unsupervised and supervised STS methods, including these
newly proposed methods in three different English STS datasets, two non-
English datasets and a bio-medical STS dataset to list the best supervised and
unsupervised STS methods.
We then embed these STS methods in translation technology applications.
Firstly we experiment with Translation Memory (TM) systems. We propose a
novel TM matching and retrieval method based on STS methods that outperform
current TM systems. We then utilise the developed STS architectures in
translation Quality Estimation (QE). We show that the proposed methods are
simple but outperform complex QE architectures and improve the state-of-theart
results. The implementations of these methods have been released as open
source
Population decline, infrastructure and sustainability
Japan has experienced population decline since 2010 and the situation is expected to become more severe after 2030 with forecasts indicating an expected 30% decline from 2005 to 2055. Many other developed countries such as Germany and Korea are also experiencing depopulation.
These demographic changes are expected to affect society at many levels such as labour markets decline, increased tax burden to sustain pension systems, and economic stagnation. Little is known however about the impacts of population decline on man-made physical infrastructure, such as possible deterioration of current infrastructure or increased financial burden of sustaining it. Infrastructure can be classified into 3 categories: point-type (e.g. buildings), point-network type (e.g. water supply) and network type (e.g. road). The impact of depopulation may vary according to the type of infrastructure. Previous research in this area has been limited in scope (e.g. case studies conducted in a single city focusing on a single type of infrastructure) and method (e.g. most research in the topic has been qualitative).
This thesis presents a new comprehensive study on the impacts of population decline on infrastructure in Japan, taking into account all types of infrastructure and using a quantitative approach. Data collection methods include interviews and two large scale questionnaire surveys, the first conducted with municipalities and the second, a stated preference survey, conducted with members of the public. The goal of sustainable development is relevant even in a depopulated society, and hence a sustainable development framework is applied to the analysis where social, economic, environmental and engineering impacts are investigated.
The main findings indicate that some infrastructure impacts observed and reported in depopulated areas do not seem to be related to any population decline; moreover, the preferences of citizens for infrastructure development is very similar between depopulated areas and non-depopulated areas. The results also suggest that the premises of Barroâs overlapping generations model, very relevant to a discussion of intergenerational decision making and related sustainability, appear to be rejected in this context
Model-based Specification of RESTful SOA on the Basis of Flexible SOM Business Process Models
Die Umwelt von Unternehmen zeichnet sich in der heutigen Zeit durch eine hohe Dynamik und stetig wachsende KomplexitĂ€t aus. In diesem Umfeld ist die rasche Anpassung der betrieblichen Leistungserstellung eine notwendige Konsequenz, um die WettbewerbsfĂ€higkeit eines Unternehmens und dadurch sein Ăberleben sicherzustellen. Bei der evolutionĂ€ren Anpassung betrieblicher Systeme ist die FlexibilitĂ€t betrieblicher GeschĂ€ftsprozesse ein zentraler Erfolgsfaktor. In der Vergangenheit fĂŒhrten flexible GeschĂ€ftsprozesse jedoch meist zu verringerten Automatisierungsgraden der unterstĂŒtzenden Anwendungssysteme (AwS), und damit zu Inkonsistenzen im betrieblichen Informationssystem.
Die Bereitstellung von LösungsansĂ€tzen fĂŒr eine zĂŒgige Entwicklung von AwS und ihre Ausrichtung auf verĂ€nderte fachliche Anforderungen ist Aufgabe der Systementwicklung. Bisherige Konzepte, Hilfsmittel und IT-Architekturen beantworten die Frage nach einer ganzheitlichen und systematischen Gestaltung und Pflege von AwS und deren konsistenten Abstimmung mit flexiblen GeschĂ€ftsprozessen jedoch methodisch nicht adĂ€quat. Als Antwort auf diese Frage wird in der vorliegenden Arbeit die SOM-R-Methodik konstruiert, einer modellbasierten Entwicklungsmethodik auf Basis des Semantischen Objektmodells (SOM) fĂŒr die ganzheitliche Entwicklung und Weiterentwicklung von RESTful SOA auf Basis flexibler SOM-GeschĂ€ftsprozessmodelle. Mit der RESTful SOA wird durch die Gestaltung service-orientierter Architekturen (SOA) nach dem Architekturstil REST eine Zielarchitektur fĂŒr flexibel anpassbare AwS entworfen.
Ein wesentlicher Beitrag dieser Arbeit besteht in der methodisch durchgĂ€ngigen ZusammenfĂŒhrung der fachlichen GeschĂ€ftsprozessebene mit den softwaretechnischen Ebenen der RESTful SOA. Durch die Definition eines gemeinsamen Begriffssystems und einheitlichen Architekturrahmens wird eine modellbasierte Abbildung von Konzepten des SOM-GeschĂ€ftsprozessmodells in die Spezifikationen von Ressourcen sowie weiteren Bausteinen des AwS realisiert. Die Modellierung von Struktur und Verhalten der GeschĂ€ftsprozesse mit SOM ist dafĂŒr eine wichtige Voraussetzung. Der zweite zentrale Beitrag dieser Arbeit ist ein modellbasierter Lösungsansatz zur UnterstĂŒtzung der Pflege von betrieblichen Informationssystemen. Die SOM-R-Methodik wird hierzu um ein Vorgehensmodell sowie AnsĂ€tze zur Analyse der Auswirkungen von StrukturĂ€nderungen und der Ermittlung von Assistenzinformationen fĂŒr die Weiterentwicklung von AwS erweitert. Die werkzeuggestĂŒtzte Bereitstellung dieser Informationen leitet den Systementwickler bei der zielgerichteten Anpassung von RESTful SOA, bzw. der dazu korrespondierenden Modellsysteme, an die Ănderungen flexibler SOM-GeschĂ€ftsprozessmodelle an. Die praktische Anwendung der SOM-R-Methodik wird im Rahmen einer Fallstudie demonstriert und erlĂ€utert.Strong dynamics and a continuous increase of complexity characterize a companyâs environment at present times. In such an environment, the rapid adaptation of the production and delivery of goods and services is a necessary consequence to ensure the survival of a company. A key success factor for the evolutionary adaptation of a business system is the flexibility of its business processes. In the past, flexible business processes generally lead to a reduced level of automation in the supported application system, and consequently to inconsistencies in the business information system.
The provision of appropriate solutions for the quick development of application systems and their alignment to changing business requirements is a central task of the system development discipline. Current concepts, tools and IT architectures do not give a methodically adequate answer to the question of a holistic and systematic design and maintenance of application systems, and their consistent alignment with flexible business processes. As an answer to this question, the SOM-R methodology, a model-based development method based on the Semantic Object Model (SOM) for the holistic development and maintenance of RESTful SOA on the basis of flexible SOM business process models, is designed in this work. Through applying the architectural style REST to service oriented architectures (SOA), the RESTful SOA is designed as the target software architecture of flexible adaptable application systems.
The first main contribution of this research is a methodically consistent way for bridging the gap between the business process layer and the software technical layers of the RESTful SOA. Defining a common conceptual and architectural framework realizes the mapping of the concepts of SOM business process models to the model-based specification of resources and other modules of the application system. Modeling the structure and behavior of business processes with SOM is an important prerequisite for that. The second main contribution of this work is a model-based approach to supporting the maintenance of business information systems. Therefore, various approaches for analyzing the effect of structural changes and deriving assistance information to support the application system maintenance extend the SOM-R methodology. The tool-supported provision of this information guides the system developer in adapting a RESTful SOA, or rather the corresponding modeling system, to the structural changes of flexible SOM business process models. A case study demonstrates and explains the practical application of the SOM-R methodology