18,954 research outputs found
Sources of Transfer in Multilingual Named Entity Recognition
Named-entities are inherently multilingual, and annotations in any given
language may be limited. This motivates us to consider polyglot named-entity
recognition (NER), where one model is trained using annotated data drawn from
more than one language. However, a straightforward implementation of this
simple idea does not always work in practice: naive training of NER models
using annotated data drawn from multiple languages consistently underperforms
models trained on monolingual data alone, despite having access to more
training data. The starting point of this paper is a simple solution to this
problem, in which polyglot models are fine-tuned on monolingual data to
consistently and significantly outperform their monolingual counterparts. To
explain this phenomena, we explore the sources of multilingual transfer in
polyglot NER models and examine the weight structure of polyglot models
compared to their monolingual counterparts. We find that polyglot models
efficiently share many parameters across languages and that fine-tuning may
utilize a large number of those parameters.Comment: ACL 202
ParsBERT: Transformer-based Model for Persian Language Understanding
The surge of pre-trained language models has begun a new era in the field of
Natural Language Processing (NLP) by allowing us to build powerful language
models. Among these models, Transformer-based models such as BERT have become
increasingly popular due to their state-of-the-art performance. However, these
models are usually focused on English, leaving other languages to multilingual
models with limited resources. This paper proposes a monolingual BERT for the
Persian language (ParsBERT), which shows its state-of-the-art performance
compared to other architectures and multilingual models. Also, since the amount
of data available for NLP tasks in Persian is very restricted, a massive
dataset for different NLP tasks as well as pre-training the model is composed.
ParsBERT obtains higher scores in all datasets, including existing ones as well
as composed ones and improves the state-of-the-art performance by outperforming
both multilingual BERT and other prior works in Sentiment Analysis, Text
Classification and Named Entity Recognition tasks.Comment: 10 pages, 5 figures, 7 tables, table 7 corrected and some refs
related to table
Noise-robust Named Entity Understanding for Virtual Assistants
Named Entity Understanding (NEU) plays an essential role in interactions
between users and voice assistants, since successfully identifying entities and
correctly linking them to their standard forms is crucial to understanding the
user's intent. NEU is a challenging task in voice assistants due to the
ambiguous nature of natural language and because noise introduced by speech
transcription and user errors occur frequently in spoken natural language
queries. In this paper, we propose an architecture with novel features that
jointly solves the recognition of named entities (a.k.a. Named Entity
Recognition, or NER) and the resolution to their canonical forms (a.k.a. Entity
Linking, or EL). We show that by combining NER and EL information in a joint
reranking module, our proposed framework improves accuracy in both tasks. This
improved performance and the features that enable it, also lead to better
accuracy in downstream tasks, such as domain classification and semantic
parsing.Comment: 9 page
Collective Entity Disambiguation with Structured Gradient Tree Boosting
We present a gradient-tree-boosting-based structured learning model for
jointly disambiguating named entities in a document. Gradient tree boosting is
a widely used machine learning algorithm that underlies many top-performing
natural language processing systems. Surprisingly, most works limit the use of
gradient tree boosting as a tool for regular classification or regression
problems, despite the structured nature of language. To the best of our
knowledge, our work is the first one that employs the structured gradient tree
boosting (SGTB) algorithm for collective entity disambiguation. By defining
global features over previous disambiguation decisions and jointly modeling
them with local features, our system is able to produce globally optimized
entity assignments for mentions in a document. Exact inference is prohibitively
expensive for our globally normalized model. To solve this problem, we propose
Bidirectional Beam Search with Gold path (BiBSG), an approximate inference
algorithm that is a variant of the standard beam search algorithm. BiBSG makes
use of global information from both past and future to perform better local
search. Experiments on standard benchmark datasets show that SGTB significantly
improves upon published results. Specifically, SGTB outperforms the previous
state-of-the-art neural system by near 1\% absolute accuracy on the popular
AIDA-CoNLL dataset.Comment: Accepted by NAACL 201
Machine Learning with World Knowledge: The Position and Survey
Machine learning has become pervasive in multiple domains, impacting a wide
variety of applications, such as knowledge discovery and data mining, natural
language processing, information retrieval, computer vision, social and health
informatics, ubiquitous computing, etc. Two essential problems of machine
learning are how to generate features and how to acquire labels for machines to
learn. Particularly, labeling large amount of data for each domain-specific
problem can be very time consuming and costly. It has become a key obstacle in
making learning protocols realistic in applications. In this paper, we will
discuss how to use the existing general-purpose world knowledge to enhance
machine learning processes, by enriching the features or reducing the labeling
work. We start from the comparison of world knowledge with domain-specific
knowledge, and then introduce three key problems in using world knowledge in
learning processes, i.e., explicit and implicit feature representation,
inference for knowledge linking and disambiguation, and learning with direct or
indirect supervision. Finally we discuss the future directions of this research
topic
Deep Learning applied to NLP
Convolutional Neural Network (CNNs) are typically associated with Computer
Vision. CNNs are responsible for major breakthroughs in Image Classification
and are the core of most Computer Vision systems today. More recently CNNs have
been applied to problems in Natural Language Processing and gotten some
interesting results. In this paper, we will try to explain the basics of CNNs,
its different variations and how they have been applied to NLP
Learning Named Entity Tagger using Domain-Specific Dictionary
Recent advances in deep neural models allow us to build reliable named entity
recognition (NER) systems without handcrafting features. However, such methods
require large amounts of manually-labeled training data. There have been
efforts on replacing human annotations with distant supervision (in conjunction
with external dictionaries), but the generated noisy labels pose significant
challenges on learning effective neural models. Here we propose two neural
models to suit noisy distant supervision from the dictionary. First, under the
traditional sequence labeling framework, we propose a revised fuzzy CRF layer
to handle tokens with multiple possible labels. After identifying the nature of
noisy labels in distant supervision, we go beyond the traditional framework and
propose a novel, more effective neural model AutoNER with a new Tie or Break
scheme. In addition, we discuss how to refine distant supervision for better
NER performance. Extensive experiments on three benchmark datasets demonstrate
that AutoNER achieves the best performance when only using dictionaries with no
additional human effort, and delivers competitive results with state-of-the-art
supervised benchmarks
Detecting and Extracting Events from Text Documents
Events of various kinds are mentioned and discussed in text documents,
whether they are books, news articles, blogs or microblog feeds. The paper
starts by giving an overview of how events are treated in linguistics and
philosophy. We follow this discussion by surveying how events and associated
information are handled in computationally. In particular, we look at how
textual documents can be mined to extract events and ancillary information.
These days, it is mostly through the application of various machine learning
techniques. We also discuss applications of event detection and extraction
systems, particularly in summarization, in the medical domain and in the
context of Twitter posts. We end the paper with a discussion of challenges and
future directions.Comment: This is work in progress. Please email [email protected] with any
comments for improvemen
End-to-end named entity extraction from speech
Named entity recognition (NER) is among SLU tasks that usually extract
semantic information from textual documents. Until now, NER from speech is made
through a pipeline process that consists in processing first an automatic
speech recognition (ASR) on the audio and then processing a NER on the ASR
outputs. Such approach has some disadvantages (error propagation, metric to
tune ASR systems sub-optimal in regards to the final task, reduced space search
at the ASR output level...) and it is known that more integrated approaches
outperform sequential ones, when they can be applied. In this paper, we present
a first study of end-to-end approach that directly extracts named entities from
speech, though a unique neural architecture. On a such way, a joint
optimization is able for both ASR and NER. Experiments are carried on French
data easily accessible, composed of data distributed in several evaluation
campaign. Experimental results show that this end-to-end approach provides
better results (F-measure=0.69 on test data) than a classical pipeline approach
to detect named entity categories (F-measure=0.65).Comment: Submitted to Interspeech 201
JaTeCS an open-source JAva TExt Categorization System
JaTeCS is an open source Java library that supports research on automatic
text categorization and other related problems, such as ordinal regression and
quantification, which are of special interest in opinion mining applications.
It covers all the steps of an experimental activity, from reading the corpus to
the evaluation of the experimental results. As JaTeCS is focused on text as the
main input data, it provides the user with many text-dedicated tools, e.g.:
data readers for many formats, including the most commonly used text corpora
and lexical resources, natural language processing tools, multi-language
support, methods for feature selection and weighting, the implementation of
many machine learning algorithms as well as wrappers for well-known external
software (e.g., SVM_light) which enable their full control from code. JaTeCS
support its expansion by abstracting through interfaces many of the typical
tools and procedures used in text processing tasks. The library also provides a
number of "template" implementations of typical experimental setups (e.g.,
train-test, k-fold validation, grid-search optimization, randomized runs) which
enable fast realization of experiments just by connecting the templates with
data readers, learning algorithms and evaluation measures
- …