21,754 research outputs found
Analysis of Bag-of-n-grams Representation's Properties Based on Textual Reconstruction
Despite its simplicity, bag-of-n-grams sen- tence representation has been
found to excel in some NLP tasks. However, it has not re- ceived much attention
in recent years and fur- ther analysis on its properties is necessary. We
propose a framework to investigate the amount and type of information captured
in a general- purposed bag-of-n-grams sentence represen- tation. We first use
sentence reconstruction as a tool to obtain bag-of-n-grams representa- tion
that contains general information of the sentence. We then run prediction tasks
(sen- tence length, word content, phrase content and word order) using the
obtained representation to look into the specific type of information captured
in the representation. Our analysis demonstrates that bag-of-n-grams
representa- tion does contain sentence structure level in- formation. However,
incorporating n-grams with higher order n empirically helps little with
encoding more information in general, except for phrase content information
Deep Learning for Sentiment Analysis : A Survey
Deep learning has emerged as a powerful machine learning technique that
learns multiple layers of representations or features of the data and produces
state-of-the-art prediction results. Along with the success of deep learning in
many other application domains, deep learning is also popularly used in
sentiment analysis in recent years. This paper first gives an overview of deep
learning and then provides a comprehensive survey of its current applications
in sentiment analysis.Comment: 34 pages, 9 figures, 2 table
Analysis Methods in Neural Language Processing: A Survey
The field of natural language processing has seen impressive progress in
recent years, with neural network models replacing many of the traditional
systems. A plethora of new models have been proposed, many of which are thought
to be opaque compared to their feature-rich counterparts. This has led
researchers to analyze, interpret, and evaluate neural networks in novel and
more fine-grained ways. In this survey paper, we review analysis methods in
neural language processing, categorize them according to prominent research
trends, highlight existing limitations, and point to potential directions for
future work.Comment: Version including the supplementary materials (3 tables), also
available at https://boknilev.github.io/nlp-analysis-method
DisSent: Sentence Representation Learning from Explicit Discourse Relations
Learning effective representations of sentences is one of the core missions
of natural language understanding. Existing models either train on a vast
amount of text, or require costly, manually curated sentence relation datasets.
We show that with dependency parsing and rule-based rubrics, we can curate a
high quality sentence relation task by leveraging explicit discourse relations.
We show that our curated dataset provides an excellent signal for learning
vector representations of sentence meaning, representing relations that can
only be determined when the meanings of two sentences are combined. We
demonstrate that the automatically curated corpus allows a bidirectional LSTM
sentence encoder to yield high quality sentence embeddings and can serve as a
supervised fine-tuning dataset for larger models such as BERT. Our fixed
sentence embeddings achieve high performance on a variety of transfer tasks,
including SentEval, and we achieve state-of-the-art results on Penn Discourse
Treebank's implicit relation prediction task.Comment: 13 pages, 4 figures. ACL 201
Understanding Cross-Lingual Syntactic Transfer in Multilingual Recurrent Neural Networks
It is now established that modern neural language models can be successfully
trained on multiple languages simultaneously without changes to the underlying
architecture, providing an easy way to adapt a variety of NLP models to
low-resource languages. But what kind of knowledge is really shared among
languages within these models? Does multilingual training mostly lead to an
alignment of the lexical representation spaces or does it also enable the
sharing of purely grammatical knowledge? In this paper we dissect different
forms of cross-lingual transfer and look for its most determining factors,
using a variety of models and probing tasks. We find that exposing our language
models to a related language does not always increase grammatical knowledge in
the target language, and that optimal conditions for lexical-semantic transfer
may not be optimal for syntactic transfer.Comment: v2: Added acknowledgements, 9 pages single column with 6 figure
Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization
Word embedding methods revolve around learning continuous distributed vector
representations of words with neural networks, which can capture semantic
and/or syntactic cues, and in turn be used to induce similarity measures among
words, sentences and documents in context. Celebrated methods can be
categorized as prediction-based and count-based methods according to the
training objectives and model architectures. Their pros and cons have been
extensively analyzed and evaluated in recent studies, but there is relatively
less work continuing the line of research to develop an enhanced learning
method that brings together the advantages of the two model families. In
addition, the interpretation of the learned word representations still remains
somewhat opaque. Motivated by the observations and considering the pressing
need, this paper presents a novel method for learning the word representations,
which not only inherits the advantages of classic word embedding methods but
also offers a clearer and more rigorous interpretation of the learned word
representations. Built upon the proposed word embedding method, we further
formulate a translation-based language modeling framework for the extractive
speech summarization task. A series of empirical evaluations demonstrate the
effectiveness of the proposed word representation learning and language
modeling techniques in extractive speech summarization
ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission
Clinical notes contain information about patients that goes beyond structured
data like lab values and medications. However, clinical notes have been
underused relative to structured data, because notes are high-dimensional and
sparse. This work develops and evaluates representations of clinical notes
using bidirectional transformers (ClinicalBERT). ClinicalBERT uncovers
high-quality relationships between medical concepts as judged by humans.
ClinicalBert outperforms baselines on 30-day hospital readmission prediction
using both discharge summaries and the first few days of notes in the intensive
care unit. Code and model parameters are available.Comment: CHIL 2020 Worksho
Supervised Fine Tuning for Word Embedding with Integrated Knowledge
Learning vector representation for words is an important research field which
may benefit many natural language processing tasks. Two limitations exist in
nearly all available models, which are the bias caused by the context
definition and the lack of knowledge utilization. They are difficult to tackle
because these algorithms are essentially unsupervised learning approaches.
Inspired by deep learning, the authors propose a supervised framework for
learning vector representation of words to provide additional supervised fine
tuning after unsupervised learning. The framework is knowledge rich approacher
and compatible with any numerical vectors word representation. The authors
perform both intrinsic evaluation like attributional and relational similarity
prediction and extrinsic evaluations like the sentence completion and sentiment
analysis. Experiments results on 6 embeddings and 4 tasks with 10 datasets show
that the proposed fine tuning framework may significantly improve the quality
of the vector representation of words
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
We introduce a new language representation model called BERT, which stands
for Bidirectional Encoder Representations from Transformers. Unlike recent
language representation models, BERT is designed to pre-train deep
bidirectional representations from unlabeled text by jointly conditioning on
both left and right context in all layers. As a result, the pre-trained BERT
model can be fine-tuned with just one additional output layer to create
state-of-the-art models for a wide range of tasks, such as question answering
and language inference, without substantial task-specific architecture
modifications.
BERT is conceptually simple and empirically powerful. It obtains new
state-of-the-art results on eleven natural language processing tasks, including
pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI
accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering
Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1
(5.1 point absolute improvement)
Word Usage Similarity Estimation with Sentence Representations and Automatic Substitutes
Usage similarity estimation addresses the semantic proximity of word
instances in different contexts. We apply contextualized (ELMo and BERT) word
and sentence embeddings to this task, and propose supervised models that
leverage these representations for prediction. Our models are further assisted
by lexical substitute annotations automatically assigned to word instances by
context2vec, a neural model that relies on a bidirectional LSTM. We perform an
extensive comparison of existing word and sentence representations on benchmark
datasets addressing both graded and binary similarity. The best performing
models outperform previous methods in both settings.Comment: *SEM 201
- …