Search CORE

9 research outputs found

On the Use of Parsing for Named Entity Recognition

Author: Alonso Miguel A.
Gómez-Rodríguez Carlos
Vilares Jesús
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

[Abstract] Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic information; and we propose a new way of using parsing in NER based on casting parsing itself as a sequence labeling task.Xunta de Galicia; ED431C 2020/11Xunta de Galicia; ED431G 2019/01This work has been funded by MINECO, AEI and FEDER of UE through the ANSWER-ASAP project (TIN2017-85160-C2-1-R); and by Xunta de Galicia through a Competitive Reference Group grant (ED431C 2020/11). CITIC, as Research Center of the Galician University System, is funded by the Consellería de Educación, Universidade e Formación Profesional of the Xunta de Galicia through the European Regional Development Fund (ERDF/FEDER) with 80%, the Galicia ERDF 2014-20 Operational Programme, and the remaining 20% from the Secretaría Xeral de Universidades (Ref. ED431G 2019/01). Carlos Gómez-Rodríguez has also received funding from the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, Grant No. 714150)

Multidisciplinary Digital Publishing Institute

Repositorio da Universidade da Coruña

Directory of Open Access Journals

Bidirectional End-to-End Learning of Retriever-Reader Paradigm for Entity Linking

Author: Huang Fei
Huang Shen
Jiang Yong
Li Yangning
Li Yinghui
Lu Xingyu
Shen Ying
Xie Pengjun
Zheng Hai-Tao
Publication venue
Publication date: 03/07/2023
Field of study

Entity Linking (EL) is a fundamental task for Information Extraction and Knowledge Graphs. The general form of EL (i.e., end-to-end EL) aims to first find mentions in the given input document and then link the mentions to corresponding entities in a specific knowledge base. Recently, the paradigm of retriever-reader promotes the progress of end-to-end EL, benefiting from the advantages of dense entity retrieval and machine reading comprehension. However, the existing study only trains the retriever and the reader separately in a pipeline manner, which ignores the benefit that the interaction between the retriever and the reader can bring to the task. To advance the retriever-reader paradigm to perform more perfectly on end-to-end EL, we propose BEER

^2

, a Bidirectional End-to-End training framework for Retriever and Reader. Through our designed bidirectional end-to-end training, BEER

^2

guides the retriever and the reader to learn from each other, make progress together, and ultimately improve EL performance. Extensive experiments on benchmarks of multiple domains demonstrate the effectiveness of our proposed BEER

^2

.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

arXiv.org e-Print Archive

Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

Author: Hu Baotian
Li Yuxin
Qin Bing
Shan Zifei
Xu Zhenran
Publication venue
Publication date: 29/10/2023
Field of study

Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-the-art EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task.Comment: WSDM 202

arXiv.org e-Print Archive

Three essays on environmental economics

Author: PARK Hyemin
Publication venue: 'Sejong University Language Research Institute'
Publication date: 01/01/2019
Field of study

Thesis(Doctoral) --KDI School:Ph.D in Development Policy,2019Due to its adverse health effects, particulate matter (PM) pollution has become a critical public policy issue in Northeast Asia. As concerns about PM pollution rise, so does interest in identifying its origins, such as transboundary pollutant sources. Employing daily average PM10 concentration level data from Beijing, Shanghai and Seoul during 2014-2016, we estimate the direction and extent of the spillover effect of PM10 density between China and Korea. Estimation outcomes suggest that PM10 density levels in Beijing and Shanghai are Granger causes for PM density in Seoul, but not the other way around. PM 10 density in Seoul is increased by 0.13 ppm and 0.133 ppm in response to one ppm increase in PM10 density in Beijing and Shanghai on the previous day, respectively. This cross-border spillover effect from Beijing is reduced by 0.076 ppm from May to October, when the air flow makes it difficult for PM10 sources generated in Beijing to reach Seoul.Chapter 1. THE CROSS-BORDER SPILLOVER EFFECT OF PARTICULATE MATTER POLLUTION IN KOREA Chapter 2. FACTORS TO ENHAANCE COMPLIANCE WITH ETS IN KOREA BASED ON COMPANY LEVEL DATA Chapter 3. SUSTAINABLE MANAGEMENT OF CARBON SEQUESTRATION SERVICE IN AREAS WITH HIGH DEVELOPMENT PRESSURE: CONSIDERING LAND USE CHANGES AND CARBON COSTSdoctoralpublishedHyemin PARK

KDI School Archives

The Case of Wikidata

Author: Farda-Sarbas Mariam
Publication venue
Publication date: 01/01/2024
Field of study

Since its launch in 2012, Wikidata has grown to become the largest open knowledge base (KB), containing more than 100 million data items and over 6 million registered users. Wikidata serves as the structured data backbone of Wikipedia, addressing data inconsistencies, and adhering to the motto of “serving anyone anywhere in the world,” a vision realized through the diversity of knowledge. Despite being a collaboratively contributed platform, the Wikidata community heavily relies on bots, automated accounts with batch, and speedy editing rights, for a majority of edits. As Wikidata approaches its first decade, the question arises: How close is Wikidata to achieving its vision of becoming a global KB and how diverse is it in serving the global population? This dissertation investigates the current status of Wikidata’s diversity, the role of bot interventions on diversity, and how bots can be leveraged to improve diversity within the context of Wikidata. The methodologies used in this study are mapping study and content analysis, which led to the development of three datasets: 1) Wikidata Research Articles Dataset, covering the literature on Wikidata from its first decade of existence sourced from online databases to inspect its current status; 2) Wikidata Requests-for-Permissions Dataset, based on the pages requesting bot rights on the Wikidata website to explore bots from a community perspective; and 3) Wikidata Revision History Dataset, compiled from the edit history of Wikidata to investigate bot editing behavior and its impact on diversity, all of which are freely available online. The insights gained from the mapping study reveal the growing popularity of Wikidata in the research community and its various application areas, indicative of its progress toward the ultimate goal of reaching the global community. However, there is currently no research addressing the topic of diversity in Wikidata, which could shed light on its capacity to serve a diverse global population. To address this gap, this dissertation proposes a diversity measurement concept that defines diversity in a KB context in terms of variety, balance, and disparity and is capable of assessing diversity in a KB from two main angles: user and data. The application of this concept on the domains and classes of the Wikidata Revision History Dataset exposes imbalanced content distribution across Wikidata domains, which indicates low data diversity in Wikidata domains. Further analysis discloses that bots have been active since the inception of Wikidata, and the community embraces their involvement in content editing tasks, often importing data from Wikipedia, which shows a low diversity of sources in bot edits. Bots and human users engage in similar editing tasks but exhibit distinct editing patterns. The findings of this thesis confirm that bots possess the potential to influence diversity within Wikidata by contributing substantial amounts of data to specific classes and domains, leading to an imbalance. However, this potential can also be harnessed to enhance coverage in classes with limited content and restore balance, thus improving diversity. Hence, this study proposes to enhance diversity through automation and demonstrate the practical implementation of the recommendations using a specific use case. In essence, this research enhances our understanding of diversity in relation to a KB, elucidates the influence of automation on data diversity, and sheds light on diversity improvement within a KB context through the usage of automation.Seit seiner Einführung im Jahr 2012 hat sich Wikidata zu der größten offenen Wissensdatenbank entwickelt, die mehr als 100 Millionen Datenelemente und über 6 Millionen registrierte Benutzer enthält. Wikidata dient als das strukturierte Rückgrat von Wikipedia, indem es Datenunstimmigkeiten angeht und sich dem Motto verschrieben hat, ’jedem überall auf der Welt zu dienen’, eine Vision, die durch die Diversität des Wissens verwirklicht wird. Trotz seiner kooperativen Natur ist die Wikidata-Community in hohem Maße auf Bots, automatisierte Konten mit Batch- Verarbeitung und schnelle Bearbeitungsrechte angewiesen, um die Mehrheit der Bearbeitungen durchzuführen. Da Wikidata seinem ersten Jahrzehnt entgegengeht, stellt sich die Frage: Wie nahe ist Wikidata daran, seine Vision, eine globale Wissensdatenbank zu werden, zu verwirklichen, und wie ausgeprägt ist seine Dienstleistung für die globale Bevölkerung? Diese Dissertation untersucht den aktuellen Status der Diversität von Wikidata, die Rolle von Bot-Eingriffen in Bezug auf Diversität und wie Bots im Kontext von Wikidata zur Verbesserung der Diversität genutzt werden können. Die in dieser Studie verwendeten Methoden sind Mapping-Studie und Inhaltsanalyse, die zur Entwicklung von drei Datensätzen geführt haben: 1) Wikidata Research Articles Dataset, die die Literatur zu Wikidata aus dem ersten Jahrzehnt aus Online-Datenbanken umfasst, um den aktuellen Stand zu untersuchen; 2) Requestfor- Permission Dataset, der auf den Seiten zur Beantragung von Bot-Rechten auf der Wikidata-Website basiert, um Bots aus der Perspektive der Gemeinschaft zu untersuchen; und 3)Wikidata Revision History Dataset, der aus der Bearbeitungshistorie von Wikidata zusammengestellt wurde, um das Bearbeitungsverhalten von Bots zu untersuchen und dessen Auswirkungen auf die Diversität, die alle online frei verfügbar sind. Die Erkenntnisse aus der Mapping-Studie zeigen die wachsende Beliebtheit von Wikidata in der Forschungsgemeinschaft und in verschiedenen Anwendungsbereichen, was auf seinen Fortschritt hin zur letztendlichen Zielsetzung hindeutet, die globale Gemeinschaft zu erreichen. Es gibt jedoch derzeit keine Forschung, die sich mit dem Thema der Diversität in Wikidata befasst und Licht auf seine Fähigkeit werfen könnte, eine vielfältige globale Bevölkerung zu bedienen. Um diese Lücke zu schließen, schlägt diese Dissertation ein Konzept zur Messung der Diversität vor, das die Diversität im Kontext einer Wissensbasis anhand von Vielfalt, Balance und Diskrepanz definiert und in der Lage ist, die Diversität aus zwei Hauptperspektiven zu bewerten: Benutzer und Daten. Die Anwendung dieses Konzepts auf die Bereiche und Klassen des Wikidata Revision History Dataset zeigt eine unausgewogene Verteilung des Inhalts über die Bereiche von Wikidata auf, was auf eine geringe Diversität der Daten in den Bereichen von Wikidata hinweist. Weitere Analysen zeigen, dass Bots seit der Gründung von Wikidata aktiv waren und von der Gemeinschaft inhaltliche Bearbeitungsaufgaben angenommen werden, oft mit Datenimporten aus Wikipedia, was auf eine geringe Diversität der Quellen bei Bot-Bearbeitungen hinweist. Bots und menschliche Benutzer führen ähnliche Bearbeitungsaufgaben aus, zeigen jedoch unterschiedliche Bearbeitungsmuster. Die Ergebnisse dieser Dissertation bestätigen, dass Bots das Potenzial haben, die Diversität in Wikidata zu beeinflussen, indem sie bedeutende Datenmengen zu bestimmten Klassen und Bereichen beitragen, was zu einer Ungleichgewichtung führt. Dieses Potenzial kann jedoch auch genutzt werden, um die Abdeckung in Klassen mit begrenztem Inhalt zu verbessern und das Gleichgewicht wiederherzustellen, um die Diversität zu verbessern. Daher schlägt diese Studie vor, die Diversität durch Automatisierung zu verbessern und die praktische Umsetzung der Empfehlungen anhand eines spezifischen Anwendungsfalls zu demonstrieren. Kurz gesagt trägt diese Forschung dazu bei, unser Verständnis der Diversität im Kontext einer Wissensbasis zu vertiefen, wirft Licht auf den Einfluss von Automatisierung auf die Diversität von Daten und zeigt die Verbesserung der Diversität im Kontext einer Wissensbasis durch die Verwendung von Automatisierung auf

Institutional Repository of the Freie Universität Berlin

Deep learning based semantic textual similarity for applications in translation technology

Author: Ranasinghe Tharindu
Publication venue: University of Wolverhampton
Publication date: 01/01/2021
Field of study

A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Semantic Textual Similarity (STS) measures the equivalence of meanings between two textual segments. It is a fundamental task for many natural language processing applications. In this study, we focus on employing STS in the context of translation technology. We start by developing models to estimate STS. We propose a new unsupervised vector aggregation-based STS method which relies on contextual word embeddings. We also propose a novel Siamese neural network based on efficient recurrent neural network units. We empirically evaluate various unsupervised and supervised STS methods, including these newly proposed methods in three different English STS datasets, two non- English datasets and a bio-medical STS dataset to list the best supervised and unsupervised STS methods. We then embed these STS methods in translation technology applications. Firstly we experiment with Translation Memory (TM) systems. We propose a novel TM matching and retrieval method based on STS methods that outperform current TM systems. We then utilise the developed STS architectures in translation Quality Estimation (QE). We show that the proposed methods are simple but outperform complex QE architectures and improve the state-of-theart results. The implementations of these methods have been released as open source

Wolverhampton Intellectual Repository and E-theses

Population decline, infrastructure and sustainability

Author: Uemura Tetsuji
Publication venue
Publication date: 01/08/2014
Field of study

Japan has experienced population decline since 2010 and the situation is expected to become more severe after 2030 with forecasts indicating an expected 30% decline from 2005 to 2055. Many other developed countries such as Germany and Korea are also experiencing depopulation. These demographic changes are expected to affect society at many levels such as labour markets decline, increased tax burden to sustain pension systems, and economic stagnation. Little is known however about the impacts of population decline on man-made physical infrastructure, such as possible deterioration of current infrastructure or increased financial burden of sustaining it. Infrastructure can be classified into 3 categories: point-type (e.g. buildings), point-network type (e.g. water supply) and network type (e.g. road). The impact of depopulation may vary according to the type of infrastructure. Previous research in this area has been limited in scope (e.g. case studies conducted in a single city focusing on a single type of infrastructure) and method (e.g. most research in the topic has been qualitative). This thesis presents a new comprehensive study on the impacts of population decline on infrastructure in Japan, taking into account all types of infrastructure and using a quantitative approach. Data collection methods include interviews and two large scale questionnaire surveys, the first conducted with municipalities and the second, a stated preference survey, conducted with members of the public. The goal of sustainable development is relevant even in a depopulated society, and hence a sustainable development framework is applied to the analysis where social, economic, environmental and engineering impacts are investigated. The main findings indicate that some infrastructure impacts observed and reported in depopulated areas do not seem to be related to any population decline; moreover, the preferences of citizens for infrastructure development is very similar between depopulated areas and non-depopulated areas. The results also suggest that the premises of Barro’s overlapping generations model, very relevant to a discussion of intergenerational decision making and related sustainability, appear to be rejected in this context

LSE Theses Online

Model-based Specification of RESTful SOA on the Basis of Flexible SOM Business Process Models

Author: Wolf Matthias
Publication venue
Publication date
Field of study

Die Umwelt von Unternehmen zeichnet sich in der heutigen Zeit durch eine hohe Dynamik und stetig wachsende Komplexität aus. In diesem Umfeld ist die rasche Anpassung der betrieblichen Leistungserstellung eine notwendige Konsequenz, um die Wettbewerbsfähigkeit eines Unternehmens und dadurch sein Überleben sicherzustellen. Bei der evolutionären Anpassung betrieblicher Systeme ist die Flexibilität betrieblicher Geschäftsprozesse ein zentraler Erfolgsfaktor. In der Vergangenheit führten flexible Geschäftsprozesse jedoch meist zu verringerten Automatisierungsgraden der unterstützenden Anwendungssysteme (AwS), und damit zu Inkonsistenzen im betrieblichen Informationssystem. Die Bereitstellung von Lösungsansätzen für eine zügige Entwicklung von AwS und ihre Ausrichtung auf veränderte fachliche Anforderungen ist Aufgabe der Systementwicklung. Bisherige Konzepte, Hilfsmittel und IT-Architekturen beantworten die Frage nach einer ganzheitlichen und systematischen Gestaltung und Pflege von AwS und deren konsistenten Abstimmung mit flexiblen Geschäftsprozessen jedoch methodisch nicht adäquat. Als Antwort auf diese Frage wird in der vorliegenden Arbeit die SOM-R-Methodik konstruiert, einer modellbasierten Entwicklungsmethodik auf Basis des Semantischen Objektmodells (SOM) für die ganzheitliche Entwicklung und Weiterentwicklung von RESTful SOA auf Basis flexibler SOM-Geschäftsprozessmodelle. Mit der RESTful SOA wird durch die Gestaltung service-orientierter Architekturen (SOA) nach dem Architekturstil REST eine Zielarchitektur für flexibel anpassbare AwS entworfen. Ein wesentlicher Beitrag dieser Arbeit besteht in der methodisch durchgängigen Zusammenführung der fachlichen Geschäftsprozessebene mit den softwaretechnischen Ebenen der RESTful SOA. Durch die Definition eines gemeinsamen Begriffssystems und einheitlichen Architekturrahmens wird eine modellbasierte Abbildung von Konzepten des SOM-Geschäftsprozessmodells in die Spezifikationen von Ressourcen sowie weiteren Bausteinen des AwS realisiert. Die Modellierung von Struktur und Verhalten der Geschäftsprozesse mit SOM ist dafür eine wichtige Voraussetzung. Der zweite zentrale Beitrag dieser Arbeit ist ein modellbasierter Lösungsansatz zur Unterstützung der Pflege von betrieblichen Informationssystemen. Die SOM-R-Methodik wird hierzu um ein Vorgehensmodell sowie Ansätze zur Analyse der Auswirkungen von Strukturänderungen und der Ermittlung von Assistenzinformationen für die Weiterentwicklung von AwS erweitert. Die werkzeuggestützte Bereitstellung dieser Informationen leitet den Systementwickler bei der zielgerichteten Anpassung von RESTful SOA, bzw. der dazu korrespondierenden Modellsysteme, an die Änderungen flexibler SOM-Geschäftsprozessmodelle an. Die praktische Anwendung der SOM-R-Methodik wird im Rahmen einer Fallstudie demonstriert und erläutert.Strong dynamics and a continuous increase of complexity characterize a company’s environment at present times. In such an environment, the rapid adaptation of the production and delivery of goods and services is a necessary consequence to ensure the survival of a company. A key success factor for the evolutionary adaptation of a business system is the flexibility of its business processes. In the past, flexible business processes generally lead to a reduced level of automation in the supported application system, and consequently to inconsistencies in the business information system. The provision of appropriate solutions for the quick development of application systems and their alignment to changing business requirements is a central task of the system development discipline. Current concepts, tools and IT architectures do not give a methodically adequate answer to the question of a holistic and systematic design and maintenance of application systems, and their consistent alignment with flexible business processes. As an answer to this question, the SOM-R methodology, a model-based development method based on the Semantic Object Model (SOM) for the holistic development and maintenance of RESTful SOA on the basis of flexible SOM business process models, is designed in this work. Through applying the architectural style REST to service oriented architectures (SOA), the RESTful SOA is designed as the target software architecture of flexible adaptable application systems. The first main contribution of this research is a methodically consistent way for bridging the gap between the business process layer and the software technical layers of the RESTful SOA. Defining a common conceptual and architectural framework realizes the mapping of the concepts of SOM business process models to the model-based specification of resources and other modules of the application system. Modeling the structure and behavior of business processes with SOM is an important prerequisite for that. The second main contribution of this work is a model-based approach to supporting the maintenance of business information systems. Therefore, various approaches for analyzing the effect of structural changes and deriving assistance information to support the application system maintenance extend the SOM-R methodology. The tool-supported provision of this information guides the system developer in adapting a RESTful SOA, or rather the corresponding modeling system, to the structural changes of flexible SOM business process models. A case study demonstrates and explains the practical application of the SOM-R methodology