160,771 research outputs found

    Learning to Reason: End-to-End Module Networks for Visual Question Answering

    Full text link
    Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems. For example, to answer "is there an equal number of balls and boxes?" we can look for balls, look for boxes, count them, and compare the results. The recently proposed Neural Module Network (NMN) architecture implements this approach to question answering by parsing questions into linguistic substructures and assembling question-specific deep networks from smaller modules that each solve one subtask. However, existing NMN implementations rely on brittle off-the-shelf parsers, and are restricted to the module configurations proposed by these parsers rather than learning them from data. In this paper, we propose End-to-End Module Networks (N2NMNs), which learn to reason by directly predicting instance-specific network layouts without the aid of a parser. Our model learns to generate network structures (by imitating expert demonstrations) while simultaneously learning network parameters (using the downstream task loss). Experimental results on the new CLEVR dataset targeted at compositional question answering show that N2NMNs achieve an error reduction of nearly 50% relative to state-of-the-art attentional approaches, while discovering interpretable network architectures specialized for each question

    FiLM: Visual Reasoning with a General Conditioning Layer

    Full text link
    We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.Comment: AAAI 2018. Code available at http://github.com/ethanjperez/film . Extends arXiv:1707.0301

    Microgenesis, immediate experience and visual processes in reading

    Get PDF
    The concept of microgenesis refers to the development on a brief present-time scale of a percept, a thought, an object of imagination, or an expression. It defines the occurrence of immediate experience as dynamic unfolding and differentiation in which the ‘germ’ of the final experience is already embodied in the early stages of its development. Immediate experience typically concerns the focal experience of an object that is thematized as a ‘figure’ in the global field of consciousness; this can involve a percept, thought, object of imagination, or expression (verbal and/or gestural). Yet, whatever its modality or content, focal experience is postulated to develop and stabilize through dynamic differentiation and unfolding. Such a microgenetic description of immediate experience substantiates a phenomenological and genetic theory of cognition where any process of perception, thought, expression or imagination is primarily a process of genetic differentiation and development, rather than one of detection (of a stimulus array or information), transformation, and integration (of multiple primitive components) as theories of cognitivist kind have contended. My purpose in this essay is to provide an overview of the main constructs of microgenetic theory, to outline its potential avenues of future development in the field of cognitive science, and to illustrate an application of the theory to research, using visual processes in reading as an example

    Dynamic Decomposition of Spatiotemporal Neural Signals

    Full text link
    Neural signals are characterized by rich temporal and spatiotemporal dynamics that reflect the organization of cortical networks. Theoretical research has shown how neural networks can operate at different dynamic ranges that correspond to specific types of information processing. Here we present a data analysis framework that uses a linearized model of these dynamic states in order to decompose the measured neural signal into a series of components that capture both rhythmic and non-rhythmic neural activity. The method is based on stochastic differential equations and Gaussian process regression. Through computer simulations and analysis of magnetoencephalographic data, we demonstrate the efficacy of the method in identifying meaningful modulations of oscillatory signals corrupted by structured temporal and spatiotemporal noise. These results suggest that the method is particularly suitable for the analysis and interpretation of complex temporal and spatiotemporal neural signals
    • …
    corecore