45 research outputs found
Revisiting the Importance of Encoding Logic Rules in Sentiment Classification
We analyze the performance of different sentiment classification models on
syntactically complex inputs like A-but-B sentences. The first contribution of
this analysis addresses reproducible research: to meaningfully compare
different models, their accuracies must be averaged over far more random seeds
than what has traditionally been reported. With proper averaging in place, we
notice that the distillation model described in arXiv:1603.06318v4 [cs.LG],
which incorporates explicit logic rules for sentiment classification, is
ineffective. In contrast, using contextualized ELMo embeddings
(arXiv:1802.05365v2 [cs.CL]) instead of logic rules yields significantly better
performance. Additionally, we provide analysis and visualizations that
demonstrate ELMo's ability to implicitly learn logic rules. Finally, a
crowdsourced analysis reveals how ELMo outperforms baseline models even on
sentences with ambiguous sentiment labels.Comment: EMNLP 2018 Camera Read
Large language models effectively leverage document-level context for literary translation, but critical errors persist
Large language models (LLMs) are competitive with the state of the art on a
wide range of sentence-level translation datasets. However, their ability to
translate paragraphs and documents remains unexplored because evaluation in
these settings is costly and difficult. We show through a rigorous human
evaluation that asking the Gpt-3.5 (text-davinci-003) LLM to translate an
entire literary paragraph (e.g., from a novel) at once results in
higher-quality translations than standard sentence-by-sentence translation
across 18 linguistically-diverse language pairs (e.g., translating into and out
of Japanese, Polish, and English). Our evaluation, which took approximately 350
hours of effort for annotation and analysis, is conducted by hiring translators
fluent in both the source and target language and asking them to provide both
span-level error annotations as well as preference judgments of which system's
translations are better. We observe that discourse-level LLM translators commit
fewer mistranslations, grammar errors, and stylistic inconsistencies than
sentence-level approaches. With that said, critical errors still abound,
including occasional content omissions, and a human translator's intervention
remains necessary to ensure that the author's voice remains intact. We publicly
release our dataset and error annotations to spur future research on evaluation
of document-level literary translation.Comment: preprint (31 pages
Generating Question-Answer Hierarchies
The process of knowledge acquisition can be viewed as a question-answer game
between a student and a teacher in which the student typically starts by asking
broad, open-ended questions before drilling down into specifics (Hintikka,
1981; Hakkarainen and Sintonen, 2002). This pedagogical perspective motivates a
new way of representing documents. In this paper, we present SQUASH
(Specificity-controlled Question-Answer Hierarchies), a novel and challenging
text generation task that converts an input document into a hierarchy of
question-answer pairs. Users can click on high-level questions (e.g., "Why did
Frodo leave the Fellowship?") to reveal related but more specific questions
(e.g., "Who did Frodo leave with?"). Using a question taxonomy loosely based on
Lehnert (1978), we classify questions in existing reading comprehension
datasets as either "general" or "specific". We then use these labels as input
to a pipelined system centered around a conditional neural language model. We
extensively evaluate the quality of the generated QA hierarchies through
crowdsourced experiments and report strong empirical results.Comment: ACL camera ready + technical note on pipeline modifications for demo
(15 pages
Discourse-Level Language Understanding with Deep Learning
Designing computational models that can understand language at a human level is a foundational goal in the field of natural language processing (NLP). Given a sentence, machines are capable of translating it into many different languages, generating a corresponding syntactic parse tree, marking words that refer to people or places, and much more. These tasks are solved by statistical machine learning algorithms, which leverage patterns in large datasets to build predictive models. Many recent advances in NLP are due to deep learning models (parameterized as neural networks), which bypass user-specified features in favor of building representations of language directly from the text.
Despite many deep learning-fueled advances at the word and sentence level, however, computers still struggle to understand high-level discourse structure in language, or the way in which authors combine and order different units of text (e.g., sentences, paragraphs, chapters) to express a coherent message or narrative. Part of the reason is data-related, as there are no existing datasets for many contextual language-based problems, and some tasks are too complex to be framed as supervised learning problems; for the latter type, we must either resort to unsupervised learning or devise training objectives that simulate the supervised setting. Another reason is architectural: neural networks designed for sentence-level tasks require additional functionality, interpretability, and efficiency to operate at the discourse level. In this thesis, I design deep learning architectures for three NLP tasks that require integrating information across high-level linguistic context: question answering, fictional relationship understanding, and comic book narrative modeling. While these tasks are very different from each other on the surface, I show that similar neural network modules can be used in each case to form contextual representations