Search CORE

22 research outputs found

Independent Prototype Propagation for Zero-Shot Compositionality

Author: Bucur Doina
Burghouts Gertjan
Ruis Frank
Publication venue
Publication date: 01/01/2021
Field of study

Humans are good at compositional zero-shot reasoning; someone who has never seen a zebra before could nevertheless recognize one when we tell them it looks like a horse with black and white stripes. Machine learning systems, on the other hand, usually leverage spurious correlations in the training data, and while such correlations can help recognize objects in context, they hurt generalization. To be able to deal with underspecified datasets while still leveraging contextual clues during classification, we propose ProtoProp, a novel prototype propagation graph method. First we learn prototypical representations of objects (e.g., zebra) that are conditionally independent w.r.t. their attribute labels (e.g., stripes) and vice versa. Next we propagate the independent prototypes through a compositional graph, to learn compositional prototypes of novel attribute-object combinations that reflect the dependencies of the target distribution. The method does not rely on any external data, such as class hierarchy graphs or pretrained word embeddings. We evaluate our approach on AO-Clever, a synthetic and strongly visual dataset with clean labels, and UT-Zappos, a noisy real-world dataset of fine-grained shoe types. We show that in the generalized compositional zero-shot setting we outperform state-of-the-art results, and through ablations we show the importance of each part of the method and their contribution to the final results

arXiv.org e-Print Archive

University of Twente Research Information

Language-Based Augmentation to Address Shortcut Learning in Object Goal Navigation

Author: Burghouts Gertjan
Hoftijzer Dennis
Spreeuwers Luuk
Publication venue
Publication date: 07/02/2024
Field of study

Deep Reinforcement Learning (DRL) has shown great potential in enabling robots to find certain objects (e.g., `find a fridge') in environments like homes or schools. This task is known as Object-Goal Navigation (ObjectNav). DRL methods are predominantly trained and evaluated using environment simulators. Although DRL has shown impressive results, the simulators may be biased or limited. This creates a risk of shortcut learning, i.e., learning a policy tailored to specific visual details of training environments. We aim to deepen our understanding of shortcut learning in ObjectNav, its implications and propose a solution. We design an experiment for inserting a shortcut bias in the appearance of training environments. As a proof-of-concept, we associate room types to specific wall colors (e.g., bedrooms with green walls), and observe poor generalization of a state-of-the-art (SOTA) ObjectNav method to environments where this is not the case (e.g., bedrooms with blue walls). We find that shortcut learning is the root cause: the agent learns to navigate to target objects, by simply searching for the associated wall color of the target object's room. To solve this, we propose Language-Based (L-B) augmentation. Our key insight is that we can leverage the multimodal feature space of a Vision-Language Model (VLM) to augment visual representations directly at the feature-level, requiring no changes to the simulator, and only an addition of one layer to the model. Where the SOTA ObjectNav method's success rate drops 69%, our proposal has only a drop of 23%

University of Twente Research Information

Language-Based Augmentation to Address Shortcut Learning in Object Goal Navigation

Author: Burghouts Gertjan
Hoftijzer Dennis
Spreeuwers Luuk
Publication venue
Publication date: 07/02/2024
Field of study

arXiv.org e-Print Archive

Recurrently Predicting Hypergraphs

Author: Burghouts Gertjan J.
Snoek Cees G. M.
Zhang David W.
Publication venue
Publication date: 25/06/2021
Field of study

This work considers predicting the relational structure of a hypergraph for a given set of vertices, as common for applications in particle physics, biological systems and other complex combinatorial problems. A problem arises from the number of possible multi-way relationships, or hyperedges, scaling in

\mathcal{O}(2^n)

for a set of

n

elements. Simply storing an indicator tensor for all relationships is already intractable for moderately sized

n

, prompting previous approaches to restrict the number of vertices a hyperedge connects. Instead, we propose a recurrent hypergraph neural network that predicts the incidence matrix by iteratively refining an initial guess of the solution. We leverage the property that most hypergraphs of interest are sparsely connected and reduce the memory requirement to

\mathcal{O}(nk)

, where

k

is the maximum number of positive edges, i.e., edges that actually exist. In order to counteract the linearly growing memory cost from training a lengthening sequence of refinement steps, we further propose an algorithm that applies backpropagation through time on randomly sampled subsequences. We empirically show that our method can match an increase in the intrinsic complexity without a performance decrease and demonstrate superior performance compared to state-of-the-art models

arXiv.org e-Print Archive

Diffusing More Objects for Semi-Supervised Domain Adaptation with Less Labeling

Author: Burghouts Gertjan
Englebienne Gwenn
Heuvel Leander van den
van Rooij Sabina B.
Zhang David W.
Publication venue
Publication date: 19/12/2023
Field of study

For object detection, it is possible to view the prediction of bounding boxes as a reverse diffusion process. Using a diffusion model, the random bounding boxes are iteratively refined in a denoising step, conditioned on the image. We propose a stochastic accumulator function that starts each run with random bounding boxes and combines the slightly different predictions. We empirically verify that this improves detection performance. The improved detections are leveraged on unlabelled images as weighted pseudo-labels for semi-supervised learning. We evaluate the method on a challenging out-of-domain test set. Our method brings significant improvements and is on par with human-selected pseudo-labels, while not requiring any human involvement.Comment: 4 pages, Workshop on DiffusionModels, NeurIPS 202

arXiv.org e-Print Archive

Self-Guided Diffusion Models

Author: Asano Yuki M.
Burghouts Gertjan J.
Hu Vincent Tao
Snoek Cees G. M.
Zhang David W
Publication venue
Publication date: 03/04/2023
Field of study

Diffusion models have demonstrated remarkable progress in image generation quality, especially when guidance is used to control the generative process. However, guidance requires a large amount of image-annotation pairs for training and is thus dependent on their availability, correctness and unbiasedness. In this paper, we eliminate the need for such annotation by instead leveraging the flexibility of self-supervision signals to design a framework for self-guided diffusion models. By leveraging a feature extraction function and a self-annotation function, our method provides guidance signals at various image granularities: from the level of holistic images to object boxes and even segmentation masks. Our experiments on single-label and multi-label image datasets demonstrate that self-labeled guidance always outperforms diffusion models without guidance and may even surpass guidance based on ground-truth labels, especially on unbalanced data. When equipped with self-supervised box or mask proposals, our method further generates visually diverse yet semantically consistent images, without the need for any class, box, or segment label annotation. Self-guided diffusion is simple, flexible and expected to profit from deployment at scale

arXiv.org e-Print Archive

Incremental concept learning with few training examples and hierarchical classification

Author: Azzopardi George
Bouma Henri
Burghouts Gertjan J.
Eendebak Pieter T.
Schutte Klamer
SPIE - The International Society for Optical Engineering
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2015
Field of study

Object recognition and localization are important to automatically interpret video and allow better querying on its content. We propose a method for object localization that learns incrementally and addresses four key aspects. Firstly, we show that for certain applications, recognition is feasible with only a few training samples. Secondly, we show that novel objects can be added incrementally without retraining existing objects, which is important for fast interaction. Thirdly, we show that an unbalanced number of positive training samples leads to biased classi er scores that can be corrected by modifying weights. Fourthly, we show that the detector performance can deteriorate due to hard-negative mining for similar or closely related classes (e.g., for Barbie and dress, because the doll is wearing a dress). This can be solved by our hierarchical classi cation. We introduce a new dataset, which we call TOSO, and use it to demonstrate the e ectiveness of the proposed method for the localization and recognition of multiple objects in images.This research was performed in the GOOSE project, which is jointly funded by the enabling technology program Adaptive Multi Sensor Networks (AMSN) and the MIST research program of the Dutch Ministry of Defense. This publication was supported by the research program Making Sense of Big Data (MSoBD).peer-reviewe

OAR@UM

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Data Augmentations in Deep Weight Spaces

Author: Achituve Idan
Burghouts Gertjan J.
Chechik Gal
Fetaya Ethan
Gavves Efstratios
Kofinas Miltiadis
Maron Haggai
Navon Aviv
Shamsian Aviv
Snoek Cees G. M.
Valperga Riccardo
Zhang David W.
Zhang Yan
Publication venue
Publication date: 15/11/2023
Field of study

Learning in weight spaces, where neural networks process the weights of other deep neural networks, has emerged as a promising research direction with applications in various fields, from analyzing and editing neural fields and implicit neural representations, to network pruning and quantization. Recent works designed architectures for effective learning in that space, which takes into account its unique, permutation-equivariant, structure. Unfortunately, so far these architectures suffer from severe overfitting and were shown to benefit from large datasets. This poses a significant challenge because generating data for this learning setup is laborious and time-consuming since each data sample is a full set of network weights that has to be trained. In this paper, we address this difficulty by investigating data augmentations for weight spaces, a set of techniques that enable generating new data examples on the fly without having to train additional input weight space elements. We first review several recently proposed data augmentation schemes %that were proposed recently and divide them into categories. We then introduce a novel augmentation scheme based on the Mixup method. We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate, which can be valuable for future studies.Comment: Accepted to NeurIPS 2023 Workshop on Symmetry and Geometry in Neural Representation

arXiv.org e-Print Archive

Recognition and localization of relevant human behavior in videos, SPIE,

Author: Coen Van Leeuwen
Gertjan Burghouts
Henri Bouma
Johan-Martijn Ten Hove
Leo De Penning
Maarten Kruithof
Patrick Hanckmann
Sander Landsmeer
Sanne Korzec
Sebastiaan Van Den Broek
Publication venue
Publication date: 01/01/2013
Field of study

ABSTRACT Ground surveillance is normally performed by human assets, since it requires visual intelligence. However, especially for military operations, this can be dangerous and is very resource intensive. Therefore, unmanned autonomous visualintelligence systems are desired. In this paper, we present an improved system that can recognize actions of a human and interactions between multiple humans. Central to the new system is our agent-based architecture. The system is trained on thousands of videos and evaluated on realistic persistent surveillance data in the DARPA Mind's Eye program, with hours of videos of challenging scenes. The results show that our system is able to track the people, detect and localize events, and discriminate between different behaviors, and it performs 3.4 times better than our previous system

CiteSeerX