Search CORE

14 research outputs found

Recommended from our members

Exploiting multimodality and structure in world representations

Author: Cangea Cătălina
Publication venue: University of Cambridge
Publication date: 01/03/2021
Field of study

An essential aim of artificial intelligence research is to design agents that will eventually cooperate with humans within the real world. To this end, embodied learning is emerging as one of the most important efforts contributed by the machine learning community towards this goal. Recently developing sub-fields concern various aspects of such systems---visual reasoning, language representations, causal mechanisms, robustness to out-of-distribution inputs, to name only a few. In particular, multimodal learning and language grounding are vital to achieving a strong understanding of the real world. Humans build internal representations via interacting with their environment, learning complex associations between visual, auditory and linguistic concepts. Since the world abounds with structure, graph-based encodings are also likely to be incorporated in reasoning and decision-making modules. Furthermore, these relational representations are rather symbolic in nature---providing advantages over other formats, such as raw pixels---and can encode various types of links (temporal, causal, spatial) which can be essential for understanding and acting in the real world. This thesis presents three research works that study and develop likely aspects of future intelligent agents. The first contribution centers on vision-and-language learning, introducing a challenging embodied task that shifts the focus of an existing one to the visual reasoning problem. By extending popular visual question answering (VQA) paradigms, I also designed several models that were evaluated on the novel dataset. This produced initial performance estimates for environment understanding, through the lens of a more challenging VQA downstream task. The second work presents two ways of obtaining hierarchical representations of graph-structured data. These methods either scaled to much larger graphs than the ones processed by the best-performing method at the time, or incorporated theoretical properties via the use of topological data analysis algorithms. Both approaches competed with contemporary state-of-the-art graph classification methods, even outside social domains in the second case, where the inductive bias was PageRank-driven. Finally, the third contribution delves further into relational learning, presenting a probabilistic treatment of graph representations in complex settings such as few-shot, multi-task learning and scarce-labelled data regimes. By adding relational inductive biases to neural processes, the resulting framework can model an entire distribution of functions which generate datasets with structure. This yielded significant performance gains, especially in the aforementioned complex scenarios, with semantically-accurate uncertainty estimates that drastically improved over the neural process baseline. This type of framework may eventually contribute to developing lifelong-learning systems, due to its ability to adapt to novel tasks and distributions. The benchmark, methods and frameworks that I have devised during my doctoral studies suggest important future directions for embodied and graph representation learning research. These areas have increasingly proved their relevance to designing intelligent and collaborative agents, which we may interact with in the near future. By addressing several challenges in this problem space, my contributions therefore take a few steps towards building machine learning systems to be deployed in real-life settings.DREAM CD

Apollo (Cambridge)

Deep Graph Mapper: Seeing Graphs Through the Neural Lens

Author: Cristian Bodnar
Cătălina Cangea
Pietro Liò
Publication venue: 'Frontiers Media SA'
Publication date: 01/06/2021
Field of study

Graph summarization has received much attention lately, with various works tackling the challenge of defining pooling operators on data regions with arbitrary structures. These contrast the grid-like ones encountered in image inputs, where techniques such as max-pooling have been enough to show empirical success. In this work, we merge the Mapper algorithm with the expressive power of graph neural networks to produce topologically grounded graph summaries. We demonstrate the suitability of Mapper as a topological framework for graph pooling by proving that Mapper is a generalization of pooling methods based on soft cluster assignments. Building upon this, we show how easy it is to design novel pooling algorithms that obtain competitive results with other state-of-the-art methods. Additionally, we use our method to produce GNN-aided visualisations of attributed complex networks

Directory of Open Access Journals

Deep Graph Mapper: Seeing Graphs Through the Neural Lens

Author: Bodnar Cristian
Cangea Cătălina
Liò Pietro
Publication venue: Frontiers in Big Data
Publication date: 20/02/2020
Field of study

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Apollo (Cambridge)

VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering

Author: Belilovsky Eugene
Cangea Cătălina
Courville Aaron
Liò Pietro
Publication venue: CoRR
Publication date: 01/01/2019
Field of study

Embodied Question Answering (EQA) is a recently proposed task, where an agent is placed in a rich 3D environment and must act based solely on its egocentric input to answer a given question. The desired outcome is that the agent learns to combine capabilities such as scene understanding, navigation and language understanding in order to perform complex reasoning in the visual world. However, initial advancements combining standard vision and language methods with imitation and reinforcement learning algorithms have shown EQA might be too complex and challenging for these techniques. In order to investigate the feasibility of EQA-type tasks, we build the VideoNavQA dataset that contains pairs of questions and videos generated in the House3D environment. The goal of this dataset is to assess question-answering performance from nearly-ideal navigation paths, while considering a much more complete variety of questions than current instantiations of the EQA task. We investigate several models, adapted from popular VQA methods, on this new benchmark. This establishes an initial understanding of how well VQA-style methods can perform within this novel EQA paradigm.CC is funded by DREAM CDT and was supported by Mila during the time in Montréal. EB is funded by IVADO. We also thank the University of Cambridge Research Computing Services for providing HPC cluster resources

arXiv.org e-Print Archive

Apollo (Cambridge)