Search CORE

137 research outputs found

Arabic Fine-Grained Entity Recognition

Author: Abdul-Mageed Muhammad
El-Shangiti Ahmed Oumar
Jarrar Mustafa
Khalilia Mohammed
Liqreina Haneen
Publication venue
Publication date: 18/12/2023
Field of study

Traditional NER systems are typically trained to recognize coarse-grained entities, and less attention is given to classifying entities into a hierarchy of fine-grained lower-level subtypes. This article aims to advance Arabic NER with fine-grained entities. We chose to extend Wojood (an open-source Nested Arabic Named Entity Corpus) with subtypes. In particular, four main entity types in Wojood, geopolitical entity (GPE), location (LOC), organization (ORG), and facility (FAC), are extended with 31 subtypes. To do this, we first revised Wojood's annotations of GPE, LOC, ORG, and FAC to be compatible with the LDC's ACE guidelines, which yielded 5, 614 changes. Second, all mentions of GPE, LOC, ORG, and FAC (~44K) in Wojood are manually annotated with the LDC's ACE sub-types. We refer to this extended version of Wojood as WojoodF ine. To evaluate our annotations, we measured the inter-annotator agreement (IAA) using both Cohen's Kappa and F1 score, resulting in 0.9861 and 0.9889, respectively. To compute the baselines of WojoodF ine, we fine-tune three pre-trained Arabic BERT encoders in three settings: flat NER, nested NER and nested NER with subtypes and achieved F1 score of 0.920, 0.866, and 0.885, respectively. Our corpus and models are open-source and available at https://sina.birzeit.edu/wojood/

arXiv.org e-Print Archive

Question Answering with Additive Restrictive Training (QuAART):Question Answering for the Rapid Development of New Knowledge Extraction Pipelines

Author: Daniel R.
Groth P.
Harper C.A.
Publication venue
Publication date: 01/01/2022
Field of study

International Migration, Integration and Social Cohesion online publications

UvA-DARE

BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages

Author: Heinzerling Benjamin
Strube Michael
Publication venue
Publication date: 01/01/2017
Field of study

We present BPEmb, a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed, BPEmb performs competitively, and for some languages bet- ter than alternative subword approaches, while requiring vastly fewer resources and no tokenization. BPEmb is available at https://github.com/bheinzerling/bpem

arXiv.org e-Print Archive

TUbiblio

New Embedded Representations and Evaluation Protocols for Inferring Transitive Relations

Author: Chakrabarti Soumen
Subramanian Sandeep
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/05/2018
Field of study

Beyond word embeddings, continuous representations of knowledge graph (KG) components, such as entities, types and relations, are widely used for entity mention disambiguation, relation inference and deep question answering. Great strides have been made in modeling general, asymmetric or antisymmetric KG relations using Gaussian, holographic, and complex embeddings. None of these directly enforce transitivity inherent in the is-instance-of and is-subtype-of relations. A recent proposal, called order embedding (OE), demands that the vector representing a subtype elementwise dominates the vector representing a supertype. However, the manner in which such constraints are asserted and evaluated have some limitations. In this short research note, we make three contributions specific to representing and inferring transitive relations. First, we propose and justify a significant improvement to the OE loss objective. Second, we propose a new representation of types as hyper-rectangular regions, that generalize and improve on OE. Third, we show that some current protocols to evaluate transitive relation inference can be misleading, and offer a sound alternative. Rather than use black-box deep learning modules off-the-shelf, we develop our training networks using elementary geometric considerations.Comment: Accepted at SIGIR 201

arXiv.org e-Print Archive

Crossref

Discovering Power Laws in Entity Length

Author: Cambria Erik
Rajapakse Jagath C.
Zhong Xiaoshi
Publication venue
Publication date: 02/12/2018
Field of study

This paper presents a discovery that the length of the entities in various datasets follows a family of scale-free power law distributions. The concept of entity here broadly includes the named entity, entity mention, time expression, aspect term, and domain-specific entity that are well investigated in natural language processing and related areas. The entity length denotes the number of words in an entity. The power law distributions in entity length possess the scale-free property and have well-defined means and finite variances. We explain the phenomenon of power laws in entity length by the principle of least effort in communication and the preferential mechanism

arXiv.org e-Print Archive