44 research outputs found

    Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering

    Full text link
    We train a language model (LM) to robustly answer multistep questions by generating and answering sub-questions. We propose Chain-of-Questions, a framework that trains a model to generate sub-questions and sub-answers one at a time by leveraging human annotated question decomposition meaning representation (QDMR). The key technical challenge is that QDMR only contains sub-questions but not answers to those sub-questions, so we treat sub-answers as latent variables and optimize them using a novel dynamic mixture of Hard-EM and MAPO. Chain-of-Questions greatly outperforms strong neuro-symbolic methods by 9.0 F1 on DROP contrast set, and outperforms GPT-3.5 by 24.3 F1 on HOTPOTQA adversarial set, thus demonstrating the effectiveness and robustness of our framework.Comment: 12 pages, 2 figure

    Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems

    Full text link
    For vision-and-language reasoning tasks, both fully connectionist, end-to-end methods and hybrid, neuro-symbolic methods have achieved high in-distribution performance. In which out-of-distribution settings does each paradigm excel? We investigate this question on both single-image and multi-image visual question-answering through four types of generalization tests: a novel segment-combine test for multi-image queries, contrast set, compositional generalization, and cross-benchmark transfer. Vision-and-language end-to-end trained systems exhibit sizeable performance drops across all these tests. Neuro-symbolic methods suffer even more on cross-benchmark transfer from GQA to VQA, but they show smaller accuracy drops on the other generalization tests and their performance quickly improves by few-shot training. Overall, our results demonstrate the complementary benefits of these two paradigms, and emphasize the importance of using a diverse suite of generalization tests to fully characterize model robustness to distribution shift.Comment: Accepted by the Findings of EMNLP 202

    Improving Sign Recognition with Phonology

    Get PDF
    We use insights from research on American Sign Language (ASL) phonology to train models for isolated sign language recognition (ISLR), a step towards automatic sign language understanding. Our key insight is to explicitly recognize the role of phonology in sign production to achieve more accurate ISLR than existing work which does not consider sign language phonology. We train ISLR models that take in pose estimations of a signer producing a single sign to predict not only the sign but additionally its phonological characteristics, such as the handshape. These auxiliary predictions lead to a nearly 9% absolute gain in sign recognition accuracy on the WLASL benchmark, with consistent improvements in ISLR regardless of the underlying prediction model architecture. This work has the potential to accelerate linguistic research in the domain of signed languages and reduce communication barriers between deaf and hearing people

    Do Localization Methods Actually Localize Memorized Data in LLMs?

    Full text link
    Large language models (LLMs) can memorize many pretrained sequences verbatim. This paper studies if we can locate a small set of neurons in LLMs responsible for memorizing a given sequence. While the concept of localization is often mentioned in prior work, methods for localization have never been systematically and directly evaluated; we address this with two benchmarking approaches. In our INJ Benchmark, we actively inject a piece of new information into a small subset of LLM weights and measure whether localization methods can identify these "ground truth" weights. In the DEL Benchmark, we study localization of pretrained data that LLMs have already memorized; while this setting lacks ground truth, we can still evaluate localization by measuring whether dropping out located neurons erases a memorized sequence from the model. We evaluate five localization methods on our two benchmarks, and both show similar rankings. All methods exhibit promising localization ability, especially for pruning-based methods, though the neurons they identify are not necessarily specific to a single memorized sequence

    Geolocated Social Media Posts are Happier: Understanding the Characteristics of Check-in Posts on Twitter

    Full text link
    The increasing prevalence of location-sharing features on social media has enabled researchers to ground computational social science research using geolocated data, affording opportunities to study human mobility, the impact of real-world events, and more. This paper analyzes what crucially separates posts with geotags from those without. We find that users who share location are not representative of the social media user population at large, jeopardizing the generalizability of research that uses only geolocated data.We consider three aspects: affect -- sentiment and emotions, content -- textual and non-textual, and audience engagement. By comparing a dataset of 1.3 million geotagged tweets with a random dataset of the same size, we show that geotagged posts on Twitter exhibit significantly more positivity, are often about joyous and special events such as weddings or graduations, convey more collectivism rather than individualism, and contain more additional features such as hashtags or objects in images, but at the same time generate substantially less engagement. These findings suggest there exist significant differences in the messages conveyed in geotagged posts. Our research carries important implications for future research utilizing geolocation social media data.Comment: 11 pages, 10 figures, 2 table
    corecore