538 research outputs found
Towards Open Intent Discovery for Conversational Text
Detecting and identifying user intent from text, both written and spoken,
plays an important role in modelling and understand dialogs. Existing research
for intent discovery model it as a classification task with a predefined set of
known categories. To generailze beyond these preexisting classes, we define a
new task of \textit{open intent discovery}. We investigate how intent can be
generalized to those not seen during training. To this end, we propose a
two-stage approach to this task - predicting whether an utterance contains an
intent, and then tagging the intent in the input utterance. Our model consists
of a bidirectional LSTM with a CRF on top to capture contextual semantics,
subject to some constraints. Self-attention is used to learn long distance
dependencies. Further, we adapt an adversarial training approach to improve
robustness and perforamce across domains. We also present a dataset of 25k
real-life utterances that have been labelled via crowd sourcing. Our
experiments across different domains and real-world datasets show the
effectiveness of our approach, with less than 100 annotated examples needed per
unique domain to recognize diverse intents. The approach outperforms
state-of-the-art baselines by 5-15% F1 score points
Morphological Embeddings for Named Entity Recognition in Morphologically Rich Languages
In this work, we present new state-of-the-art results of 93.59,% and 79.59,%
for Turkish and Czech named entity recognition based on the model of (Lample et
al., 2016). We contribute by proposing several schemes for representing the
morphological analysis of a word in the context of named entity recognition. We
show that a concatenation of this representation with the word and character
embeddings improves the performance. The effect of these representation schemes
on the tagging performance is also investigated.Comment: Working draf
Toward Mention Detection Robustness with Recurrent Neural Networks
One of the key challenges in natural language processing (NLP) is to yield
good performance across application domains and languages. In this work, we
investigate the robustness of the mention detection systems, one of the
fundamental tasks in information extraction, via recurrent neural networks
(RNNs). The advantage of RNNs over the traditional approaches is their capacity
to capture long ranges of context and implicitly adapt the word embeddings,
trained on a large corpus, into a task-specific word representation, but still
preserve the original semantic generalization to be helpful across domains. Our
systematic evaluation for RNN architectures demonstrates that RNNs not only
outperform the best reported systems (up to 9\% relative error reduction) in
the general setting but also achieve the state-of-the-art performance in the
cross-domain setting for English. Regarding other languages, RNNs are
significantly better than the traditional methods on the similar task of named
entity recognition for Dutch (up to 22\% relative error reduction).Comment: 13 pages, 11 tables, 3 figure
Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks
Recent papers have shown that neural networks obtain state-of-the-art
performance on several different sequence tagging tasks. One appealing property
of such systems is their generality, as excellent performance can be achieved
with a unified architecture and without task-specific feature engineering.
However, it is unclear if such systems can be used for tasks without large
amounts of training data. In this paper we explore the problem of transfer
learning for neural sequence taggers, where a source task with plentiful
annotations (e.g., POS tagging on Penn Treebank) is used to improve performance
on a target task with fewer available annotations (e.g., POS tagging for
microblogs). We examine the effects of transfer learning for deep hierarchical
recurrent networks across domains, applications, and languages, and show that
significant improvement can often be obtained. These improvements lead to
improvements over the current state-of-the-art on several well-studied tasks.Comment: Accepted as a conference paper at ICLR 2017. This is an extended
version of the original paper (https://arxiv.org/abs/1603.06270). The
original paper proposes a new architecture, while this version focuses on
transfer learning for a general model clas
Named Entity Recognition for the Estonian Language
Käesoleva töö raames uuriti eestikeelsetes tekstides nimega üksuste tuvastamise probleemi (NÜT) kasutades masinõppemeetodeid. NÜT süsteemi väljatöötamisel käsitleti kahte põhiaspekti: nimede tuvastamise algoritmi valikut ja nimede esitusviisi. Selleks võrreldi maksimaalse entroopia (MaxEnt) ja lineaarse ahela tinglike juhuslike väljade (CRF) masinõppemeetodeid. Uuriti, kuidas mõjutavad
masinõppe tulemusi kolme liiki tunnused: 1) lokaalsed tunnused (sõnast saadud informatsioon), 2) globaalsed tunnused (sõna kõikide esinemiskontekstide tunnused) ja 3) väline teadmus (veebist saadud nimede nimekirjad). Masinõppe algoritmide treenimiseks ja võrdlemiseks annoteeriti käsitsi ajakirjanduse artiklitest koosnev tekstikorpus, milles märgendati asukohtade, inimeste,
organisatsioonide ja ehitise-laadsete objektide nimed.
Eksperimentide tulemusena ilmnes, et CRF ületab oluliselt MaxEnt meetodit kõikide vaadeldud nimeliikide tuvastamisel. Parim tulemus, 0.86 F1 skoor, saavutati
annoteeritud korpusel CRF meetodiga, kasutades kombinatsiooni kõigist kolmest nime esitusvariandist.
Vaadeldi ka süsteemi kohanemisvõimet teiste tekstižanridega spordi domeeni näitel ja uuriti võimalusi süsteemi kasutamiseks teistes keeltes nimede tuvastamisel.In this thesis we study the applicability of recent statistical methods to extraction of named entities from Estonian texts. In particular, we explore two
fundamental design challenges: choice of inference algorithm and text representation. We compare two state-of-the-art supervised learning methods, Linear Chain Conditional Random Fields (CRF) and Maximum Entropy Model (MaxEnt). In representing named entities, we consider three sources of information: 1) local features, which are based on the word itself, 2) global features extracted from other occurrences of the same word in the whole document and 3) external knowledge
represented by lists of entities extracted from the Web. To train and evaluate our NER systems, we assembled a text corpus of Estonian newspaper articles in which we manually annotated names of locations, persons, organisations and facilities. In the process of comparing several solutions we achieved F1 score of 0.86 by the CRF system using combination of local and global features and external knowledge
A Byte-sized Approach to Named Entity Recognition
In biomedical literature, it is common for entity boundaries to not align
with word boundaries. Therefore, effective identification of entity spans
requires approaches capable of considering tokens that are smaller than words.
We introduce a novel, subword approach for named entity recognition (NER) that
uses byte-pair encodings (BPE) in combination with convolutional and recurrent
neural networks to produce byte-level tags of entities. We present experimental
results on several standard biomedical datasets, namely the BioCreative VI
Bio-ID, JNLPBA, and GENETAG datasets. We demonstrate competitive performance
while bypassing the specialized domain expertise needed to create biomedical
text tokenization rules.Comment: 6 pages, 5 tables, 1 figur
Practical, Efficient, and Customizable Active Learning for Named Entity Recognition in the Digital Humanities
Scholars in inter-disciplinary fields like the
Digital Humanities are increasingly interested
in semantic annotation of specialized corpora.
Yet, under-resourced languages, imperfect or
noisily structured data, and user-specific classification tasks make it difficult to meet their
needs using off-the-shelf models. Manual annotation of large corpora from scratch, meanwhile, can be prohibitively expensive. Thus,
we propose an active learning solution for
named entity recognition, attempting to maximize a custom model’s improvement per additional unit of manual annotation. Our system
robustly handles any domain or user-defined
label set and requires no external resources,
enabling quality named entity recognition for
Humanities corpora where such resources are
not available. Evaluating on typologically disparate languages and datasets, we reduce required annotation by 20-60% and greatly outperform a competitive active learning baseline.New York University–Paris Sciences Lettres Global Alliance grant; National Endowment for the Humanities grant, award HAA-256078-17; Computational Approaches to Modeling Language lab
at New York University Abu Dhab
Character Feature Engineering for Japanese Word Segmentation
On word segmentation problems, machine learning architecture engineering
often draws attention. The problem representation itself, however, has remained
almost static as either word lattice ranking or character sequence tagging, for
at least two decades. The latter of-ten shows stronger predictive power than
the former for out-of-vocabulary (OOV) issue. When the issue escalating to
rapid adaptation, which is a common scenario for industrial applications,
active learning of partial annotations or re-training with additional lexical
re-sources is usually applied, however, from a somewhat word-based perspective.
Not only it is uneasy for end-users to comply with linguistically consistent
word boundary decisions, but also the risk/cost of forking models permanently
with estimated weights is seldom affordable. To overcome the obstacle, this
work provides an alternative, which uses linguistic intuition about character
compositions, such that a sophisticated feature set and its derived scheme can
enable dynamic lexicon expansion with the model remaining intact. Experiment
results suggest that the proposed solution, with or without external lexemes,
performs competitively in terms of F1 score and OOV recall across various
datasets
Solving Sinhala Language Arithmetic Problems using Neural Networks
A methodology is presented to solve Arithmetic problems in Sinhala Language
using a Neural Network. The system comprises of (a) keyword identification, (b)
question identification, (c) mathematical operation identification and is
combined using a neural network. Naive Bayes Classification is used in order to
identify keywords and Conditional Random Field to identify the question and the
operation which should be performed on the identified keywords to achieve the
expected result. "One vs. all Classification" is done using a neural network
for sentences. All functions are combined through the neural network which
builds an equation to solve the problem. The paper compares each methodology in
ARIS and Mahoshadha to the method presented in the paper. Mahoshadha2 learns to
solve arithmetic problems with the accuracy of 76%.Comment: 34th National Information Technology Conference (NITC 2016
- …