241 research outputs found

    Evaluating the Representational Hub of Language and Vision Models

    Get PDF
    The multimodal models used in the emerging field at the intersection of computational linguistics and computer vision implement the bottom-up processing of the `Hub and Spoke' architecture proposed in cognitive science to represent how the brain processes and combines multi-sensory inputs. In particular, the Hub is implemented as a neural network encoder. We investigate the effect on this encoder of various vision-and-language tasks proposed in the literature: visual question answering, visual reference resolution, and visually grounded dialogue. To measure the quality of the representations learned by the encoder, we use two kinds of analyses. First, we evaluate the encoder pre-trained on the different vision-and-language tasks on an existing diagnostic task designed to assess multimodal semantic understanding. Second, we carry out a battery of analyses aimed at studying how the encoder merges and exploits the two modalities.Comment: Accepted to IWCS 201

    Psycholinguistics meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering

    Get PDF
    We study the issue of catastrophic forgetting in the context of neural multimodal approaches to Visual Question Answering (VQA). Motivated by evidence from psycholinguistics, we devise a set of linguistically-informed VQA tasks, which differ by the types of questions involved (Wh-questions and polar questions). We test what impact task difficulty has on continual learning, and whether the order in which a child acquires question types facilitates computational models. Our results show that dramatic forgetting is at play and that task difficulty and order matter. Two well-known current continual learning methods mitigate the problem only to a limiting degree

    Strict and non-strict negative concord in Hungarian: A unified analysis

    Get PDF
    Surányi (2006) observed that Hungarian has a hybrid (strict + non-strict) negative concord system. This paper proposes a uniform analysis of that system within the general framework of Zeijlstra (2004, 2008) and, especially, Chierchia (2013), with the following new ingredients. Sentential negation NEM is the same full negation in the presence of both strict and non-strict concord items. Preverbal SENKI `n-one’ type negative concord items occupy the specifier position of either NEM `not' or SEM `nor'. The latter, SEM spells out IS `too, even’ in the immediate scope of negation; it is a focus-sensitive head on the clausal spine. SEM can be seen as an overt counterpart of the phonetically null head that Chierchia dubs NEG; it is capable of invoking an abstract (disembodied) negation at the edge of its projection

    A discourse-based approach for Arabic question answering

    Get PDF
    The treatment of complex questions with explanatory answers involves searching for arguments in texts. Because of the prominent role that discourse relations play in reflecting text-producers’ intentions, capturing the underlying structure of text constitutes a good instructor in this issue. From our extensive review, a system for automatic discourse analysis that creates full rhetorical structures in large scale Arabic texts is currently unavailable. This is due to the high computational complexity involved in processing a large number of hypothesized relations associated with large texts. Therefore, more practical approaches should be investigated. This paper presents a new Arabic Text Parser oriented for question answering systems dealing with لماذا “why” and كيف “how to” questions. The Text Parser presented here considers the sentence as the basic unit of text and incorporates a set of heuristics to avoid computational explosion. With this approach, the developed question answering system reached a significant improvement over the baseline with a Recall of 68% and MRR of 0.62
    corecore