39,396 research outputs found
Multilingual Language Processing From Bytes
We describe an LSTM-based model which we call Byte-to-Span (BTS) that reads
text as bytes and outputs span annotations of the form [start, length, label]
where start positions, lengths, and labels are separate entries in our
vocabulary. Because we operate directly on unicode bytes rather than
language-specific words or characters, we can analyze text in many languages
with a single model. Due to the small vocabulary size, these multilingual
models are very compact, but produce results similar to or better than the
state-of- the-art in Part-of-Speech tagging and Named Entity Recognition that
use only the provided training datasets (no external data sources). Our models
are learning "from scratch" in that they do not rely on any elements of the
standard pipeline in Natural Language Processing (including tokenization), and
thus can run in standalone fashion on raw text
Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks
Selecting optimal parameters for a neural network architecture can often make
the difference between mediocre and state-of-the-art performance. However,
little is published which parameters and design choices should be evaluated or
selected making the correct hyperparameter optimization often a "black art that
requires expert experiences" (Snoek et al., 2012). In this paper, we evaluate
the importance of different network design choices and hyperparameters for five
common linguistic sequence tagging tasks (POS, Chunking, NER, Entity
Recognition, and Event Detection). We evaluated over 50.000 different setups
and found, that some parameters, like the pre-trained word embeddings or the
last layer of the network, have a large impact on the performance, while other
parameters, for example the number of LSTM layers or the number of recurrent
units, are of minor importance. We give a recommendation on a configuration
that performs well among different tasks.Comment: 34 pages. 9 page version of this paper published at EMNLP 201
An attentive neural architecture for joint segmentation and parsing and its application to real estate ads
In processing human produced text using natural language processing (NLP)
techniques, two fundamental subtasks that arise are (i) segmentation of the
plain text into meaningful subunits (e.g., entities), and (ii) dependency
parsing, to establish relations between subunits. In this paper, we develop a
relatively simple and effective neural joint model that performs both
segmentation and dependency parsing together, instead of one after the other as
in most state-of-the-art works. We will focus in particular on the real estate
ad setting, aiming to convert an ad to a structured description, which we name
property tree, comprising the tasks of (1) identifying important entities of a
property (e.g., rooms) from classifieds and (2) structuring them into a tree
format. In this work, we propose a new joint model that is able to tackle the
two tasks simultaneously and construct the property tree by (i) avoiding the
error propagation that would arise from the subtasks one after the other in a
pipelined fashion, and (ii) exploiting the interactions between the subtasks.
For this purpose, we perform an extensive comparative study of the pipeline
methods and the new proposed joint model, reporting an improvement of over
three percentage points in the overall edge F1 score of the property tree.
Also, we propose attention methods, to encourage our model to focus on salient
tokens during the construction of the property tree. Thus we experimentally
demonstrate the usefulness of attentive neural architectures for the proposed
joint model, showcasing a further improvement of two percentage points in edge
F1 score for our application.Comment: Preprint - Accepted for publication in Expert Systems with
Application
- …