932 research outputs found
Pseudo-labels for Supervised Learning on Dynamic Vision Sensor Data, Applied to Object Detection under Ego-motion
In recent years, dynamic vision sensors (DVS), also known as event-based
cameras or neuromorphic sensors, have seen increased use due to various
advantages over conventional frame-based cameras. Using principles inspired by
the retina, its high temporal resolution overcomes motion blurring, its high
dynamic range overcomes extreme illumination conditions and its low power
consumption makes it ideal for embedded systems on platforms such as drones and
self-driving cars. However, event-based data sets are scarce and labels are
even rarer for tasks such as object detection. We transferred discriminative
knowledge from a state-of-the-art frame-based convolutional neural network
(CNN) to the event-based modality via intermediate pseudo-labels, which are
used as targets for supervised learning. We show, for the first time,
event-based car detection under ego-motion in a real environment at 100 frames
per second with a test average precision of 40.3% relative to our annotated
ground truth. The event-based car detector handles motion blur and poor
illumination conditions despite not explicitly trained to do so, and even
complements frame-based CNN detectors, suggesting that it has learnt
generalized visual representations
Unsupervised domain adaptation and super resolution on drone images for autonomous dry herbage biomass estimation
Herbage mass yield and composition estimation is an important tool for dairy
farmers to ensure an adequate supply of high quality herbage for grazing and
subsequently milk production. By accurately estimating herbage mass and
composition, targeted nitrogen fertiliser application strategies can be
deployed to improve localised regions in a herbage field, effectively reducing
the negative impacts of over-fertilization on biodiversity and the environment.
In this context, deep learning algorithms offer a tempting alternative to the
usual means of sward composition estimation, which involves the destructive
process of cutting a sample from the herbage field and sorting by hand all
plant species in the herbage. The process is labour intensive and time
consuming and so not utilised by farmers. Deep learning has been successfully
applied in this context on images collected by high-resolution cameras on the
ground. Moving the deep learning solution to drone imaging, however, has the
potential to further improve the herbage mass yield and composition estimation
task by extending the ground-level estimation to the large surfaces occupied by
fields/paddocks. Drone images come at the cost of lower resolution views of the
fields taken from a high altitude and requires further herbage ground-truth
collection from the large surfaces covered by drone images. This paper proposes
to transfer knowledge learned on ground-level images to raw drone images in an
unsupervised manner. To do so, we use unpaired image style translation to
enhance the resolution of drone images by a factor of eight and modify them to
appear closer to their ground-level counterparts. We then ...
~\url{www.github.com/PaulAlbert31/Clover_SSL}.Comment: 11 pages, 5 figures. Accepted at the Agriculture-Vision CVPR 2022
Worksho
ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization
Modality selection is an important step when designing multimodal systems, especially in the case of cross-domain activity recognition as certain modalities are more robust to domain shift than others. However, selecting only the modalities which have a positive contribution requires a systematic approach. We tackle this problem by proposing an unsupervised modality selection method (ModSelect), which does not require any ground-truth labels. We determine the correlation between the predictions of multiple unimodal classifiers and the domain discrepancy between their embeddings. Then, we systematically compute modality selection thresholds, which select only modalities with a high correlation and low domain discrepancy. We show in our experiments that our method ModSelect chooses only modalities with positive contributions and consistently improves the performance on a Synthetic-to-Real domain adaptation benchmark, narrowing the domain gap
Audio-Adaptive Activity Recognition Across Video Domains
This paper strives for activity recognition under domain shift, for example
caused by change of scenery or camera viewpoint. The leading approaches reduce
the shift in activity appearance by adversarial training and self-supervised
learning. Different from these vision-focused works we leverage activity sounds
for domain adaptation as they have less variance across domains and can
reliably indicate which activities are not happening. We propose an
audio-adaptive encoder and associated learning methods that discriminatively
adjust the visual feature representation as well as addressing shifts in the
semantic distribution. To further eliminate domain-specific features and
include domain-invariant activity sounds for recognition, an audio-infused
recognizer is proposed, which effectively models the cross-modal interaction
across domains. We also introduce the new task of actor shift, with a
corresponding audio-visual dataset, to challenge our method with situations
where the activity appearance changes dramatically. Experiments on this
dataset, EPIC-Kitchens and CharadesEgo show the effectiveness of our approach.Comment: Accepted at CVPR 202
- …