45 research outputs found

    Revisiting the Importance of Encoding Logic Rules in Sentiment Classification

    Full text link
    We analyze the performance of different sentiment classification models on syntactically complex inputs like A-but-B sentences. The first contribution of this analysis addresses reproducible research: to meaningfully compare different models, their accuracies must be averaged over far more random seeds than what has traditionally been reported. With proper averaging in place, we notice that the distillation model described in arXiv:1603.06318v4 [cs.LG], which incorporates explicit logic rules for sentiment classification, is ineffective. In contrast, using contextualized ELMo embeddings (arXiv:1802.05365v2 [cs.CL]) instead of logic rules yields significantly better performance. Additionally, we provide analysis and visualizations that demonstrate ELMo's ability to implicitly learn logic rules. Finally, a crowdsourced analysis reveals how ELMo outperforms baseline models even on sentences with ambiguous sentiment labels.Comment: EMNLP 2018 Camera Read

    Large language models effectively leverage document-level context for literary translation, but critical errors persist

    Full text link
    Large language models (LLMs) are competitive with the state of the art on a wide range of sentence-level translation datasets. However, their ability to translate paragraphs and documents remains unexplored because evaluation in these settings is costly and difficult. We show through a rigorous human evaluation that asking the Gpt-3.5 (text-davinci-003) LLM to translate an entire literary paragraph (e.g., from a novel) at once results in higher-quality translations than standard sentence-by-sentence translation across 18 linguistically-diverse language pairs (e.g., translating into and out of Japanese, Polish, and English). Our evaluation, which took approximately 350 hours of effort for annotation and analysis, is conducted by hiring translators fluent in both the source and target language and asking them to provide both span-level error annotations as well as preference judgments of which system's translations are better. We observe that discourse-level LLM translators commit fewer mistranslations, grammar errors, and stylistic inconsistencies than sentence-level approaches. With that said, critical errors still abound, including occasional content omissions, and a human translator's intervention remains necessary to ensure that the author's voice remains intact. We publicly release our dataset and error annotations to spur future research on evaluation of document-level literary translation.Comment: preprint (31 pages

    Generating Question-Answer Hierarchies

    Full text link
    The process of knowledge acquisition can be viewed as a question-answer game between a student and a teacher in which the student typically starts by asking broad, open-ended questions before drilling down into specifics (Hintikka, 1981; Hakkarainen and Sintonen, 2002). This pedagogical perspective motivates a new way of representing documents. In this paper, we present SQUASH (Specificity-controlled Question-Answer Hierarchies), a novel and challenging text generation task that converts an input document into a hierarchy of question-answer pairs. Users can click on high-level questions (e.g., "Why did Frodo leave the Fellowship?") to reveal related but more specific questions (e.g., "Who did Frodo leave with?"). Using a question taxonomy loosely based on Lehnert (1978), we classify questions in existing reading comprehension datasets as either "general" or "specific". We then use these labels as input to a pipelined system centered around a conditional neural language model. We extensively evaluate the quality of the generated QA hierarchies through crowdsourced experiments and report strong empirical results.Comment: ACL camera ready + technical note on pipeline modifications for demo (15 pages

    Discourse-Level Language Understanding with Deep Learning

    Get PDF
    Designing computational models that can understand language at a human level is a foundational goal in the field of natural language processing (NLP). Given a sentence, machines are capable of translating it into many different languages, generating a corresponding syntactic parse tree, marking words that refer to people or places, and much more. These tasks are solved by statistical machine learning algorithms, which leverage patterns in large datasets to build predictive models. Many recent advances in NLP are due to deep learning models (parameterized as neural networks), which bypass user-specified features in favor of building representations of language directly from the text. Despite many deep learning-fueled advances at the word and sentence level, however, computers still struggle to understand high-level discourse structure in language, or the way in which authors combine and order different units of text (e.g., sentences, paragraphs, chapters) to express a coherent message or narrative. Part of the reason is data-related, as there are no existing datasets for many contextual language-based problems, and some tasks are too complex to be framed as supervised learning problems; for the latter type, we must either resort to unsupervised learning or devise training objectives that simulate the supervised setting. Another reason is architectural: neural networks designed for sentence-level tasks require additional functionality, interpretability, and efficiency to operate at the discourse level. In this thesis, I design deep learning architectures for three NLP tasks that require integrating information across high-level linguistic context: question answering, fictional relationship understanding, and comic book narrative modeling. While these tasks are very different from each other on the surface, I show that similar neural network modules can be used in each case to form contextual representations
    corecore