9 research outputs found

    On the Use of Parsing for Named Entity Recognition

    Get PDF
    [Abstract] Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic information; and we propose a new way of using parsing in NER based on casting parsing itself as a sequence labeling task.Xunta de Galicia; ED431C 2020/11Xunta de Galicia; ED431G 2019/01This work has been funded by MINECO, AEI and FEDER of UE through the ANSWER-ASAP project (TIN2017-85160-C2-1-R); and by Xunta de Galicia through a Competitive Reference Group grant (ED431C 2020/11). CITIC, as Research Center of the Galician University System, is funded by the Consellería de Educación, Universidade e Formación Profesional of the Xunta de Galicia through the European Regional Development Fund (ERDF/FEDER) with 80%, the Galicia ERDF 2014-20 Operational Programme, and the remaining 20% from the Secretaría Xeral de Universidades (Ref. ED431G 2019/01). Carlos Gómez-Rodríguez has also received funding from the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, Grant No. 714150)

    Bidirectional End-to-End Learning of Retriever-Reader Paradigm for Entity Linking

    Full text link
    Entity Linking (EL) is a fundamental task for Information Extraction and Knowledge Graphs. The general form of EL (i.e., end-to-end EL) aims to first find mentions in the given input document and then link the mentions to corresponding entities in a specific knowledge base. Recently, the paradigm of retriever-reader promotes the progress of end-to-end EL, benefiting from the advantages of dense entity retrieval and machine reading comprehension. However, the existing study only trains the retriever and the reader separately in a pipeline manner, which ignores the benefit that the interaction between the retriever and the reader can bring to the task. To advance the retriever-reader paradigm to perform more perfectly on end-to-end EL, we propose BEER2^2, a Bidirectional End-to-End training framework for Retriever and Reader. Through our designed bidirectional end-to-end training, BEER2^2 guides the retriever and the reader to learn from each other, make progress together, and ultimately improve EL performance. Extensive experiments on benchmarks of multiple domains demonstrate the effectiveness of our proposed BEER2^2.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

    Full text link
    Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-the-art EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task.Comment: WSDM 202

    Three essays on environmental economics

    Get PDF
    Thesis(Doctoral) --KDI School:Ph.D in Development Policy,2019Due to its adverse health effects, particulate matter (PM) pollution has become a critical public policy issue in Northeast Asia. As concerns about PM pollution rise, so does interest in identifying its origins, such as transboundary pollutant sources. Employing daily average PM10 concentration level data from Beijing, Shanghai and Seoul during 2014-2016, we estimate the direction and extent of the spillover effect of PM10 density between China and Korea. Estimation outcomes suggest that PM10 density levels in Beijing and Shanghai are Granger causes for PM density in Seoul, but not the other way around. PM 10 density in Seoul is increased by 0.13 ppm and 0.133 ppm in response to one ppm increase in PM10 density in Beijing and Shanghai on the previous day, respectively. This cross-border spillover effect from Beijing is reduced by 0.076 ppm from May to October, when the air flow makes it difficult for PM10 sources generated in Beijing to reach Seoul.Chapter 1. THE CROSS-BORDER SPILLOVER EFFECT OF PARTICULATE MATTER POLLUTION IN KOREA Chapter 2. FACTORS TO ENHAANCE COMPLIANCE WITH ETS IN KOREA BASED ON COMPANY LEVEL DATA Chapter 3. SUSTAINABLE MANAGEMENT OF CARBON SEQUESTRATION SERVICE IN AREAS WITH HIGH DEVELOPMENT PRESSURE: CONSIDERING LAND USE CHANGES AND CARBON COSTSdoctoralpublishedHyemin PARK

    The Case of Wikidata

    Get PDF
    Since its launch in 2012, Wikidata has grown to become the largest open knowledge base (KB), containing more than 100 million data items and over 6 million registered users. Wikidata serves as the structured data backbone of Wikipedia, addressing data inconsistencies, and adhering to the motto of “serving anyone anywhere in the world,” a vision realized through the diversity of knowledge. Despite being a collaboratively contributed platform, the Wikidata community heavily relies on bots, automated accounts with batch, and speedy editing rights, for a majority of edits. As Wikidata approaches its first decade, the question arises: How close is Wikidata to achieving its vision of becoming a global KB and how diverse is it in serving the global population? This dissertation investigates the current status of Wikidata’s diversity, the role of bot interventions on diversity, and how bots can be leveraged to improve diversity within the context of Wikidata. The methodologies used in this study are mapping study and content analysis, which led to the development of three datasets: 1) Wikidata Research Articles Dataset, covering the literature on Wikidata from its first decade of existence sourced from online databases to inspect its current status; 2) Wikidata Requests-for-Permissions Dataset, based on the pages requesting bot rights on the Wikidata website to explore bots from a community perspective; and 3) Wikidata Revision History Dataset, compiled from the edit history of Wikidata to investigate bot editing behavior and its impact on diversity, all of which are freely available online. The insights gained from the mapping study reveal the growing popularity of Wikidata in the research community and its various application areas, indicative of its progress toward the ultimate goal of reaching the global community. However, there is currently no research addressing the topic of diversity in Wikidata, which could shed light on its capacity to serve a diverse global population. To address this gap, this dissertation proposes a diversity measurement concept that defines diversity in a KB context in terms of variety, balance, and disparity and is capable of assessing diversity in a KB from two main angles: user and data. The application of this concept on the domains and classes of the Wikidata Revision History Dataset exposes imbalanced content distribution across Wikidata domains, which indicates low data diversity in Wikidata domains. Further analysis discloses that bots have been active since the inception of Wikidata, and the community embraces their involvement in content editing tasks, often importing data from Wikipedia, which shows a low diversity of sources in bot edits. Bots and human users engage in similar editing tasks but exhibit distinct editing patterns. The findings of this thesis confirm that bots possess the potential to influence diversity within Wikidata by contributing substantial amounts of data to specific classes and domains, leading to an imbalance. However, this potential can also be harnessed to enhance coverage in classes with limited content and restore balance, thus improving diversity. Hence, this study proposes to enhance diversity through automation and demonstrate the practical implementation of the recommendations using a specific use case. In essence, this research enhances our understanding of diversity in relation to a KB, elucidates the influence of automation on data diversity, and sheds light on diversity improvement within a KB context through the usage of automation.Seit seiner Einführung im Jahr 2012 hat sich Wikidata zu der grĂ¶ĂŸten offenen Wissensdatenbank entwickelt, die mehr als 100 Millionen Datenelemente und über 6 Millionen registrierte Benutzer enthĂ€lt. Wikidata dient als das strukturierte Rückgrat von Wikipedia, indem es Datenunstimmigkeiten angeht und sich dem Motto verschrieben hat, ’jedem überall auf der Welt zu dienen’, eine Vision, die durch die DiversitĂ€t des Wissens verwirklicht wird. Trotz seiner kooperativen Natur ist die Wikidata-Community in hohem Maße auf Bots, automatisierte Konten mit Batch- Verarbeitung und schnelle Bearbeitungsrechte angewiesen, um die Mehrheit der Bearbeitungen durchzuführen. Da Wikidata seinem ersten Jahrzehnt entgegengeht, stellt sich die Frage: Wie nahe ist Wikidata daran, seine Vision, eine globale Wissensdatenbank zu werden, zu verwirklichen, und wie ausgeprĂ€gt ist seine Dienstleistung für die globale Bevölkerung? Diese Dissertation untersucht den aktuellen Status der DiversitĂ€t von Wikidata, die Rolle von Bot-Eingriffen in Bezug auf DiversitĂ€t und wie Bots im Kontext von Wikidata zur Verbesserung der DiversitĂ€t genutzt werden können. Die in dieser Studie verwendeten Methoden sind Mapping-Studie und Inhaltsanalyse, die zur Entwicklung von drei DatensĂ€tzen geführt haben: 1) Wikidata Research Articles Dataset, die die Literatur zu Wikidata aus dem ersten Jahrzehnt aus Online-Datenbanken umfasst, um den aktuellen Stand zu untersuchen; 2) Requestfor- Permission Dataset, der auf den Seiten zur Beantragung von Bot-Rechten auf der Wikidata-Website basiert, um Bots aus der Perspektive der Gemeinschaft zu untersuchen; und 3)Wikidata Revision History Dataset, der aus der Bearbeitungshistorie von Wikidata zusammengestellt wurde, um das Bearbeitungsverhalten von Bots zu untersuchen und dessen Auswirkungen auf die DiversitĂ€t, die alle online frei verfügbar sind. Die Erkenntnisse aus der Mapping-Studie zeigen die wachsende Beliebtheit von Wikidata in der Forschungsgemeinschaft und in verschiedenen Anwendungsbereichen, was auf seinen Fortschritt hin zur letztendlichen Zielsetzung hindeutet, die globale Gemeinschaft zu erreichen. Es gibt jedoch derzeit keine Forschung, die sich mit dem Thema der DiversitĂ€t in Wikidata befasst und Licht auf seine FĂ€higkeit werfen könnte, eine vielfĂ€ltige globale Bevölkerung zu bedienen. Um diese Lücke zu schließen, schlĂ€gt diese Dissertation ein Konzept zur Messung der DiversitĂ€t vor, das die DiversitĂ€t im Kontext einer Wissensbasis anhand von Vielfalt, Balance und Diskrepanz definiert und in der Lage ist, die DiversitĂ€t aus zwei Hauptperspektiven zu bewerten: Benutzer und Daten. Die Anwendung dieses Konzepts auf die Bereiche und Klassen des Wikidata Revision History Dataset zeigt eine unausgewogene Verteilung des Inhalts über die Bereiche von Wikidata auf, was auf eine geringe DiversitĂ€t der Daten in den Bereichen von Wikidata hinweist. Weitere Analysen zeigen, dass Bots seit der Gründung von Wikidata aktiv waren und von der Gemeinschaft inhaltliche Bearbeitungsaufgaben angenommen werden, oft mit Datenimporten aus Wikipedia, was auf eine geringe DiversitĂ€t der Quellen bei Bot-Bearbeitungen hinweist. Bots und menschliche Benutzer führen Ă€hnliche Bearbeitungsaufgaben aus, zeigen jedoch unterschiedliche Bearbeitungsmuster. Die Ergebnisse dieser Dissertation bestĂ€tigen, dass Bots das Potenzial haben, die DiversitĂ€t in Wikidata zu beeinflussen, indem sie bedeutende Datenmengen zu bestimmten Klassen und Bereichen beitragen, was zu einer Ungleichgewichtung führt. Dieses Potenzial kann jedoch auch genutzt werden, um die Abdeckung in Klassen mit begrenztem Inhalt zu verbessern und das Gleichgewicht wiederherzustellen, um die DiversitĂ€t zu verbessern. Daher schlĂ€gt diese Studie vor, die DiversitĂ€t durch Automatisierung zu verbessern und die praktische Umsetzung der Empfehlungen anhand eines spezifischen Anwendungsfalls zu demonstrieren. Kurz gesagt trĂ€gt diese Forschung dazu bei, unser VerstĂ€ndnis der DiversitĂ€t im Kontext einer Wissensbasis zu vertiefen, wirft Licht auf den Einfluss von Automatisierung auf die DiversitĂ€t von Daten und zeigt die Verbesserung der DiversitĂ€t im Kontext einer Wissensbasis durch die Verwendung von Automatisierung auf

    Deep learning based semantic textual similarity for applications in translation technology

    Get PDF
    A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Semantic Textual Similarity (STS) measures the equivalence of meanings between two textual segments. It is a fundamental task for many natural language processing applications. In this study, we focus on employing STS in the context of translation technology. We start by developing models to estimate STS. We propose a new unsupervised vector aggregation-based STS method which relies on contextual word embeddings. We also propose a novel Siamese neural network based on efficient recurrent neural network units. We empirically evaluate various unsupervised and supervised STS methods, including these newly proposed methods in three different English STS datasets, two non- English datasets and a bio-medical STS dataset to list the best supervised and unsupervised STS methods. We then embed these STS methods in translation technology applications. Firstly we experiment with Translation Memory (TM) systems. We propose a novel TM matching and retrieval method based on STS methods that outperform current TM systems. We then utilise the developed STS architectures in translation Quality Estimation (QE). We show that the proposed methods are simple but outperform complex QE architectures and improve the state-of-theart results. The implementations of these methods have been released as open source

    Population decline, infrastructure and sustainability

    Get PDF
    Japan has experienced population decline since 2010 and the situation is expected to become more severe after 2030 with forecasts indicating an expected 30% decline from 2005 to 2055. Many other developed countries such as Germany and Korea are also experiencing depopulation. These demographic changes are expected to affect society at many levels such as labour markets decline, increased tax burden to sustain pension systems, and economic stagnation. Little is known however about the impacts of population decline on man-made physical infrastructure, such as possible deterioration of current infrastructure or increased financial burden of sustaining it. Infrastructure can be classified into 3 categories: point-type (e.g. buildings), point-network type (e.g. water supply) and network type (e.g. road). The impact of depopulation may vary according to the type of infrastructure. Previous research in this area has been limited in scope (e.g. case studies conducted in a single city focusing on a single type of infrastructure) and method (e.g. most research in the topic has been qualitative). This thesis presents a new comprehensive study on the impacts of population decline on infrastructure in Japan, taking into account all types of infrastructure and using a quantitative approach. Data collection methods include interviews and two large scale questionnaire surveys, the first conducted with municipalities and the second, a stated preference survey, conducted with members of the public. The goal of sustainable development is relevant even in a depopulated society, and hence a sustainable development framework is applied to the analysis where social, economic, environmental and engineering impacts are investigated. The main findings indicate that some infrastructure impacts observed and reported in depopulated areas do not seem to be related to any population decline; moreover, the preferences of citizens for infrastructure development is very similar between depopulated areas and non-depopulated areas. The results also suggest that the premises of Barro’s overlapping generations model, very relevant to a discussion of intergenerational decision making and related sustainability, appear to be rejected in this context

    Model-based Specification of RESTful SOA on the Basis of Flexible SOM Business Process Models

    Get PDF
    Die Umwelt von Unternehmen zeichnet sich in der heutigen Zeit durch eine hohe Dynamik und stetig wachsende KomplexitĂ€t aus. In diesem Umfeld ist die rasche Anpassung der betrieblichen Leistungserstellung eine notwendige Konsequenz, um die WettbewerbsfĂ€higkeit eines Unternehmens und dadurch sein Überleben sicherzustellen. Bei der evolutionĂ€ren Anpassung betrieblicher Systeme ist die FlexibilitĂ€t betrieblicher GeschĂ€ftsprozesse ein zentraler Erfolgsfaktor. In der Vergangenheit fĂŒhrten flexible GeschĂ€ftsprozesse jedoch meist zu verringerten Automatisierungsgraden der unterstĂŒtzenden Anwendungssysteme (AwS), und damit zu Inkonsistenzen im betrieblichen Informationssystem. Die Bereitstellung von LösungsansĂ€tzen fĂŒr eine zĂŒgige Entwicklung von AwS und ihre Ausrichtung auf verĂ€nderte fachliche Anforderungen ist Aufgabe der Systementwicklung. Bisherige Konzepte, Hilfsmittel und IT-Architekturen beantworten die Frage nach einer ganzheitlichen und systematischen Gestaltung und Pflege von AwS und deren konsistenten Abstimmung mit flexiblen GeschĂ€ftsprozessen jedoch methodisch nicht adĂ€quat. Als Antwort auf diese Frage wird in der vorliegenden Arbeit die SOM-R-Methodik konstruiert, einer modellbasierten Entwicklungsmethodik auf Basis des Semantischen Objektmodells (SOM) fĂŒr die ganzheitliche Entwicklung und Weiterentwicklung von RESTful SOA auf Basis flexibler SOM-GeschĂ€ftsprozessmodelle. Mit der RESTful SOA wird durch die Gestaltung service-orientierter Architekturen (SOA) nach dem Architekturstil REST eine Zielarchitektur fĂŒr flexibel anpassbare AwS entworfen. Ein wesentlicher Beitrag dieser Arbeit besteht in der methodisch durchgĂ€ngigen ZusammenfĂŒhrung der fachlichen GeschĂ€ftsprozessebene mit den softwaretechnischen Ebenen der RESTful SOA. Durch die Definition eines gemeinsamen Begriffssystems und einheitlichen Architekturrahmens wird eine modellbasierte Abbildung von Konzepten des SOM-GeschĂ€ftsprozessmodells in die Spezifikationen von Ressourcen sowie weiteren Bausteinen des AwS realisiert. Die Modellierung von Struktur und Verhalten der GeschĂ€ftsprozesse mit SOM ist dafĂŒr eine wichtige Voraussetzung. Der zweite zentrale Beitrag dieser Arbeit ist ein modellbasierter Lösungsansatz zur UnterstĂŒtzung der Pflege von betrieblichen Informationssystemen. Die SOM-R-Methodik wird hierzu um ein Vorgehensmodell sowie AnsĂ€tze zur Analyse der Auswirkungen von StrukturĂ€nderungen und der Ermittlung von Assistenzinformationen fĂŒr die Weiterentwicklung von AwS erweitert. Die werkzeuggestĂŒtzte Bereitstellung dieser Informationen leitet den Systementwickler bei der zielgerichteten Anpassung von RESTful SOA, bzw. der dazu korrespondierenden Modellsysteme, an die Änderungen flexibler SOM-GeschĂ€ftsprozessmodelle an. Die praktische Anwendung der SOM-R-Methodik wird im Rahmen einer Fallstudie demonstriert und erlĂ€utert.Strong dynamics and a continuous increase of complexity characterize a company’s environment at present times. In such an environment, the rapid adaptation of the production and delivery of goods and services is a necessary consequence to ensure the survival of a company. A key success factor for the evolutionary adaptation of a business system is the flexibility of its business processes. In the past, flexible business processes generally lead to a reduced level of automation in the supported application system, and consequently to inconsistencies in the business information system. The provision of appropriate solutions for the quick development of application systems and their alignment to changing business requirements is a central task of the system development discipline. Current concepts, tools and IT architectures do not give a methodically adequate answer to the question of a holistic and systematic design and maintenance of application systems, and their consistent alignment with flexible business processes. As an answer to this question, the SOM-R methodology, a model-based development method based on the Semantic Object Model (SOM) for the holistic development and maintenance of RESTful SOA on the basis of flexible SOM business process models, is designed in this work. Through applying the architectural style REST to service oriented architectures (SOA), the RESTful SOA is designed as the target software architecture of flexible adaptable application systems. The first main contribution of this research is a methodically consistent way for bridging the gap between the business process layer and the software technical layers of the RESTful SOA. Defining a common conceptual and architectural framework realizes the mapping of the concepts of SOM business process models to the model-based specification of resources and other modules of the application system. Modeling the structure and behavior of business processes with SOM is an important prerequisite for that. The second main contribution of this work is a model-based approach to supporting the maintenance of business information systems. Therefore, various approaches for analyzing the effect of structural changes and deriving assistance information to support the application system maintenance extend the SOM-R methodology. The tool-supported provision of this information guides the system developer in adapting a RESTful SOA, or rather the corresponding modeling system, to the structural changes of flexible SOM business process models. A case study demonstrates and explains the practical application of the SOM-R methodology
    corecore