3,145 research outputs found

    Grounding semantics in robots for Visual Question Answering

    Get PDF
    In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning

    Multi-agent simulation: new approaches to exploring space-time dynamics in GIS

    Get PDF
    As part of the long term quest to develop more disaggregate, temporally dynamic models of spatial behaviour, micro-simulation has evolved to the point where the actions of many individuals can be computed. These multi-agent systems/simulation(MAS) models are a consequence of much better micro data, more powerful and user-friendly computer environments often based on parallel processing, and the generally recognised need in spatial science for modelling temporal process. In this paper, we develop a series of multi-agent models which operate in cellular space.These demonstrate the well-known principle that local action can give rise to global pattern but also how such pattern emerges as the consequence of positive feedback and learned behaviour. We first summarise the way cellular representation is important in adding new process functionality to GIS, and the way this is effected through ideas from cellular automata (CA) modelling. We then outline the key ideas of multi-agent simulation and this sets the scene for three applications to problems involving the use of agents to explore geographic space. We first illustrate how agents can be programmed to search route networks, finding shortest routes in adhoc as well as structured ways equivalent to the operation of the Bellman-Dijkstra algorithm. We then demonstrate how the agent-based approach can be used to simulate the dynamics of water flow, implying that such models can be used to effectively model the evolution of river systems. Finally we show how agents can detect the geometric properties of space, generating powerful results that are notpossible using conventional geometry, and we illustrate these ideas by computing the visual fields or isovists associated with different viewpoints within the Tate Gallery.Our forays into MAS are all based on developing reactive agent models with minimal interaction and we conclude with suggestions for how these models might incorporate cognition, planning, and stronger positive feedbacks between agents

    Motion Invariance in Visual Environments

    Full text link
    The puzzle of computer vision might find new challenging solutions when we realize that most successful methods are working at image level, which is remarkably more difficult than processing directly visual streams, just as happens in nature. In this paper, we claim that their processing naturally leads to formulate the motion invariance principle, which enables the construction of a new theory of visual learning based on convolutional features. The theory addresses a number of intriguing questions that arise in natural vision, and offers a well-posed computational scheme for the discovery of convolutional filters over the retina. They are driven by the Euler-Lagrange differential equations derived from the principle of least cognitive action, that parallels laws of mechanics. Unlike traditional convolutional networks, which need massive supervision, the proposed theory offers a truly new scenario in which feature learning takes place by unsupervised processing of video signals. An experimental report of the theory is presented where we show that features extracted under motion invariance yield an improvement that can be assessed by measuring information-based indexes.Comment: arXiv admin note: substantial text overlap with arXiv:1801.0711

    Accelerating Cooperative Planning for Automated Vehicles with Learned Heuristics and Monte Carlo Tree Search

    Full text link
    Efficient driving in urban traffic scenarios requires foresight. The observation of other traffic participants and the inference of their possible next actions depending on the own action is considered cooperative prediction and planning. Humans are well equipped with the capability to predict the actions of multiple interacting traffic participants and plan accordingly, without the need to directly communicate with others. Prior work has shown that it is possible to achieve effective cooperative planning without the need for explicit communication. However, the search space for cooperative plans is so large that most of the computational budget is spent on exploring the search space in unpromising regions that are far away from the solution. To accelerate the planning process, we combined learned heuristics with a cooperative planning method to guide the search towards regions with promising actions, yielding better solutions at lower computational costs
    corecore