Search CORE

622 research outputs found

BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision

Author: Devlin Jacob
Fries Jason
Giannakopoulos Athanasios
Jiang Haoming
Kingma Diederik P
Pennington Jeffrey
Weischedel Ralph
Yang Zhilin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/06/2020
Field of study

We study the open-domain named entity recognition (NER) problem under distant supervision. The distant supervision, though does not require large amounts of manual annotations, yields highly incomplete and noisy distant labels via external knowledge bases. To address this challenge, we propose a new computational framework -- BOND, which leverages the power of pre-trained language models (e.g., BERT and RoBERTa) to improve the prediction performance of NER models. Specifically, we propose a two-stage training algorithm: In the first stage, we adapt the pre-trained language model to the NER tasks using the distant labels, which can significantly improve the recall and precision; In the second stage, we drop the distant labels, and propose a self-training approach to further improve the model performance. Thorough experiments on 5 benchmark datasets demonstrate the superiority of BOND over existing distantly supervised NER methods. The code and distantly labeled data have been released in https://github.com/cliang1453/BOND.Comment: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '20

arXiv.org e-Print Archive

Crossref

UNIMIB@NEEL-IT: Named Entity Recognition and Linking of Italian Tweets

Author: Cecchini F
Fersini E
Manchanda P
Messina E
Nozza D
Palmonari M
Sas C
Publication venue: place:Torino
Publication date: 01/01/2016
Field of study

Questo articolo descrive il sistema proposto dal gruppo UNIMIB per il task di Named Entity Recognition and Linking applicato a tweet in lingua italiana (NEEL-IT). Il sistema, che rappresenta un approccio iniziale al problema, \ue8 costituito da tre passaggi fondamentali: (1) Named Entity Recognition tramite l\u2019utilizzo di Conditional Random Fields, (2) Named Entity Linking considerando sia approcci supervisionati sia modelli di linguaggio basati su reti neurali, e (3) NIL clustering tramite un approccio basato su grafi.This paper describes the framework proposed by the UNIMIB Team for the task of Named Entity Recognition and Linking of Italian Tweets (NEEL-IT). The proposed pipeline, which represents an entry level system, is composed of three main steps: (1) Named Entity Recognition using Conditional Random Fields, (2) Named Entity Linking by considering both Supervised and Neural-Network Language models, and (3) NIL clustering byusing a graph-based approach

PubliCatt

UNIMIB@NEEL-IT : named entity recognition and linking of Italian tweets

Author: Cecchini Flavio Massimiliano
Fersini Elisabetta
Manchanda Pikakshi
Messina Enza
Nozza Debora
Palmonari Matteo
Sas Cezar
Publication venue: 'OpenEdition'
Publication date: 01/01/2016
Field of study

Archivio istituzionale della Ricerca - Bocconi

Natural Language Processing for Under-resourced Languages: Developing a Welsh Natural Language Toolkit

Author: Cunliffe D
Tudhope D
Vlachidis A
Williams D
Publication venue: 'Elsevier BV'
Publication date: 01/03/2022
Field of study

Language technology is becoming increasingly important across a variety of application domains which have become common place in large, well-resourced languages. However, there is a danger that small, under-resourced languages are being increasingly pushed to the technological margins. Under-resourced languages face significant challenges in delivering the underlying language resources necessary to support such applications. This paper describes the development of a natural language processing toolkit for an under-resourced language, Cymraeg (Welsh). Rather than creating the Welsh Natural Language Toolkit (WNLT) from scratch, the approach involved adapting and enhancing the language processing functionality provided for other languages within an existing framework and making use of external language resources where available. This paper begins by introducing the GATE NLP framework, which was used as the development platform for the WNLT. It then describes each of the core modules of the WNLT in turn, detailing the extensions and adaptations required for Welsh language processing. An evaluation of the WNLT is then reported. Following this, two demonstration applications are presented. The first is a simple text mining application that analyses wedding announcements. The second describes the development of a Twitter NLP application, which extends the core WNLT pipeline. As a relatively small-scale project, the WNLT makes use of existing external language resources where possible, rather than creating new resources. This approach of adaptation and reuse can provide a practical and achievable route to developing language resources for under-resourced languages

UCL Discovery

{MasakhaNER}: {N}amed Entity Recognition for {A}frican Languages

Author: Abbott J.
Adelani D.
Adewumi T.
Adeyemi M.
Ahia O.
Akinfaderin A.
Akinode V.
Alabi J.
Anebi E.
Anuoluwapo A.
Awokoya A.
Azime I.
Bateesa T.
Buzaaba H.
Chukwuneke C.
David D.
Diallo A.
DIOP T.
Dossou B.
D’souza D.
Emezue C.
Ezeani I.
Faye A.
Gebreyohannes D.
Gitau C.
Gwadabe T.
Katusiime M.
Kreutzer J.
Lignos C.
Marengereke T.
Mayhew S.
Mbaye D.
MBOUP M.
Muhammad S.
Mukiibi J.
Muriuki G.
Nabagereka D.
Nakatumba-Nabende J.
Neubig G.
Ngom S.
Niyongabo R.
Nwaike K.
Odu N.
Ogayo P.
Ogueji K.
Oloyede T.
Orife I.
Osei S.
Otiende V.
Oyerinde S.
Palen-Michel C.
Rayson P.
Rijhwani S.
Ruder S.
Sibanda B.
Siro C.
Tilaye H.
Wairagala E.
Wambui Y.
Wolde D.
Yimam S.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2021
Field of study

MPG.PuRe

Exploration of Approaches to Arabic Named Entity Recognition

Author: Balla Husamelddin
Delaney Sarah Jane
Publication venue: Technological University Dublin
Publication date: 01/01/2020
Field of study

Abstract. The Named Entity Recognition (NER) task has attracted significant attention in Natural Language Processing (NLP) as it can enhance the performance of many NLP applications. In this paper, we compare English NER with Arabic NER in an experimental way to investigate the impact of using different classifiers and sets of features including language-independent and language-specific features. We explore the features and classifiers on five different datasets. We compare deep neural network architectures for NER with more traditional machine learning approaches to NER. We discover that most of the techniques and features used for English NER perform well on Arabic NER. Our results highlight the improvements achieved by using language-specific features in Arabic NER

Arrow@TUDublin

Linking archival data to location A case study at the UK National Archives

Author: Amy Warner
Jiayu Tang
Mark M. Hall
Paul Clough
Peter Willett
Publication venue: 'Emerald'
Publication date: 01/01/2011
Field of study

Purpose The National Archives (TNA) is the UK Government's official archive. It stores and maintains records spanning over a 1,000 years in both physical and digital form. Much of the information held by TNA includes references to place and frequently user queries to TNA's online catalogue involve searches for location. The purpose of this paper is to illustrate how TNA have extracted the geographic references in their historic data to improve access to the archives. Design/methodology/approach To be able to quickly enhance the existing archival data with geographic information, existing technologies from Natural Language Processing (NLP) and Geographical Information Retrieval (GIR) have been utilised and adapted to historical archives. Findings Enhancing the archival records with geographic information has enabled TNA to quickly develop a number of case studies highlighting how geographic information can improve access to large‐scale archival collections. The use of existing methods from the GIR domain and technologies, such as OpenLayers, enabled one to quickly implement this process in a way that is easily transferable to other institutions. Practical implications The methods and technologies described in this paper can be adapted, by other archives, to similarly enhance access to their historic data. Also the data‐sharing methods described can be used to enable the integration of knowledge held at different archival institutions. Originality/value Place is one of the core dimensions for TNA's archival data. Many of the records which are held make reference to place data (wills, legislation, court cases), and approximately one fifth of users' searches involve place names. However, there are still a number of open questions regarding the adaptation of existing GIR methods to the history domain. This paper presents an overview over available GIR methods and the challenges in applying them to historical data

Crossref

Open Research Online (The Open University)

Edge Hill University Research Information Repository

Distantly Supervised Web Relation Extraction for Knowledge Base Population

Author: Lewis
Suchanek
Vrandečić
Wu
Publication venue: 'IOS Press'
Publication date: 27/05/2016
Field of study

Extracting information from Web pages for populating large, cross-domain knowledge bases requires methods which are suitable across domains, do not require manual effort to adapt to new domains, are able to deal with noise, and integrate information extracted from different Web pages. Recent approaches have used existing knowledge bases to learn to extract information with promising results, one of those approaches being distant supervision. Distant supervision is an unsupervised method which uses background information from the Linking Open Data cloud to automatically label sentences with relations to create training data for relation classifiers. In this paper we propose the use of distant supervision for relation extraction from the Web. Although the method is promising, existing approaches are still not suitable for Web extraction as they suffer from three main issues: data sparsity, noise and lexical ambiguity. Our approach reduces the impact of data sparsity by making entity recognition tools more robust across domains and extracting relations across sentence boundaries using unsupervised co- reference resolution methods. We reduce the noise caused by lexical ambiguity by employing statistical methods to strategically select training data. To combine information extracted from multiple sources for populating knowledge bases we present and evaluate several information integration strategies and show that those benefit immensely from additional relation mentions extracted using co-reference resolution, increasing precision by 8%. We further show that strategically selecting training data can increase precision by a further 3%

CiteSeerX

Crossref

UCL Discovery

White Rose Research Online