27 research outputs found
GraphVid: It Only Takes a Few Nodes to Understand a Video
We propose a concise representation of videos that encode perceptually
meaningful features into graphs. With this representation, we aim to leverage
the large amount of redundancies in videos and save computations. First, we
construct superpixel-based graph representations of videos by considering
superpixels as graph nodes and create spatial and temporal connections between
adjacent superpixels. Then, we leverage Graph Convolutional Networks to process
this representation and predict the desired output. As a result, we are able to
train models with much fewer parameters, which translates into short training
periods and a reduction in computation resource requirements. A comprehensive
experimental study on the publicly available datasets Kinetics-400 and Charades
shows that the proposed method is highly cost-effective and uses limited
commodity hardware during training and inference. It reduces the computational
requirements 10-fold while achieving results that are comparable to
state-of-the-art methods. We believe that the proposed approach is a promising
direction that could open the door to solving video understanding more
efficiently and enable more resource limited users to thrive in this research
field.Comment: Accepted to ECCV2022 (Oral
DUQIM-Net: Probabilistic Object Hierarchy Representation for Multi-View Manipulation
Object manipulation in cluttered scenes is a difficult and important problem
in robotics. To efficiently manipulate objects, it is crucial to understand
their surroundings, especially in cases where multiple objects are stacked one
on top of the other, preventing effective grasping. We here present DUQIM-Net,
a decision-making approach for object manipulation in a setting of stacked
objects. In DUQIM-Net, the hierarchical stacking relationship is assessed using
Adj-Net, a model that leverages existing Transformer Encoder-Decoder object
detectors by adding an adjacency head. The output of this head
probabilistically infers the underlying hierarchical structure of the objects
in the scene. We utilize the properties of the adjacency matrix in DUQIM-Net to
perform decision making and assist with object-grasping tasks. Our experimental
results show that Adj-Net surpasses the state-of-the-art in object-relationship
inference on the Visual Manipulation Relationship Dataset (VMRD), and that
DUQIM-Net outperforms comparable approaches in bin clearing tasks.Comment: 8 pages, 6 figures, 3 tables. Accepted to the 2022 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS 2022
PDExplain: Contextual Modeling of PDEs in the Wild
We propose an explainable method for solving Partial Differential Equations
by using a contextual scheme called PDExplain. During the training phase, our
method is fed with data collected from an operator-defined family of PDEs
accompanied by the general form of this family. In the inference phase, a
minimal sample collected from a phenomenon is provided, where the sample is
related to the PDE family but not necessarily to the set of specific PDEs seen
in the training phase. We show how our algorithm can predict the PDE solution
for future timesteps. Moreover, our method provides an explainable form of the
PDE, a trait that can assist in modelling phenomena based on data in physical
sciences. To verify our method, we conduct extensive experimentation, examining
its quality both in terms of prediction error and explainability