575 research outputs found
Inferring object properties from human interaction and transferring them to new motions
Humans regularly interact with their surrounding objects. Such interactions often result in strongly correlated motions between humans and the interacting objects. We thus ask: “Is it possible to infer object properties from skeletal motion alone, even without seeing the interacting object itself?” In this paper, we present a fine-grained action recognition method that learns to infer such latent object properties from human interaction motion alone. This inference allows us to disentangle the motion from the object property and transfer object properties to a given motion. We collected a large number of videos and 3D skeletal motions of performing actors using an inertial motion capture device. We analyzed similar actions and learned subtle differences between them to reveal latent properties of the interacting objects. In particular, we learned to identify the interacting object, by estimating its weight, or its spillability. Our results clearly demonstrate that motions and interacting objects are highly correlated and that related object latent properties can be inferred from 3D skeleton sequences alone, leading to new synthesis possibilities for motions involving human interaction
Service-oriented Context-aware Framework
Location- and context-aware services are emerging technologies in mobile and
desktop environments, however, most of them are difficult to use and do not
seem to be beneficial enough. Our research focuses on designing and creating a
service-oriented framework that helps location- and context-aware,
client-service type application development and use. Location information is
combined with other contexts such as the users' history, preferences and
disabilities. The framework also handles the spatial model of the environment
(e.g. map of a room or a building) as a context. The framework is built on a
semantic backend where the ontologies are represented using the OWL description
language. The use of ontologies enables the framework to run inference tasks
and to easily adapt to new context types. The framework contains a
compatibility layer for positioning devices, which hides the technical
differences of positioning technologies and enables the combination of location
data of various sources
Deep learning for internet of underwater things and ocean data analytics
The Internet of Underwater Things (IoUT) is an emerging technological ecosystem developed for connecting objects in maritime and underwater environments. IoUT technologies are empowered by an extreme number of deployed sensors and actuators. In this thesis, multiple IoUT sensory data are augmented with machine intelligence for forecasting purposes
The 3rd Anti-UAV Workshop & Challenge: Methods and Results
The 3rd Anti-UAV Workshop & Challenge aims to encourage research in
developing novel and accurate methods for multi-scale object tracking. The
Anti-UAV dataset used for the Anti-UAV Challenge has been publicly released.
There are two main differences between this year's competition and the previous
two. First, we have expanded the existing dataset, and for the first time,
released a training set so that participants can focus on improving their
models. Second, we set up two tracks for the first time, i.e., Anti-UAV
Tracking and Anti-UAV Detection & Tracking. Around 76 participating teams from
the globe competed in the 3rd Anti-UAV Challenge. In this paper, we provide a
brief summary of the 3rd Anti-UAV Workshop & Challenge including brief
introductions to the top three methods in each track. The submission
leaderboard will be reopened for researchers that are interested in the
Anti-UAV challenge. The benchmark dataset and other information can be found
at: https://anti-uav.github.io/.Comment: Technical report for 3rd Anti-UAV Workshop and Challenge. arXiv admin
note: text overlap with arXiv:2108.0990
Activity profiling for minimally invasive surgery
Imperial Users onl
Extending Multi-modal Contrastive Representations
Multi-modal contrastive representation (MCR) of more than three modalities is
critical in multi-modal learning. Although recent methods showcase impressive
achievements, the high dependence on large-scale, high-quality paired data and
the expensive training costs limit their further development. Inspired by
recent C-MCR, this paper proposes Extending Multimodal Contrastive
Representation (Ex-MCR), a training-efficient and paired-data-free method to
flexibly learn unified contrastive representation space for more than three
modalities by integrating the knowledge of existing MCR spaces. Specifically,
Ex-MCR aligns multiple existing MCRs into the same based MCR, which can
effectively preserve the original semantic alignment of the based MCR. Besides,
we comprehensively enhance the entire learning pipeline for aligning MCR spaces
from the perspectives of training data, architecture, and learning objectives.
With the preserved original modality alignment and the enhanced space
alignment, Ex-MCR shows superior representation learning performance and
excellent modality extensibility. To demonstrate the effectiveness of Ex-MCR,
we align the MCR spaces of CLAP (audio-text) and ULIP (3D-vision) into the CLIP
(vision-text), leveraging the overlapping text and image modality,
respectively. Remarkably, without using any paired data, Ex-MCR learns a
3D-image-text-audio unified contrastive representation, and it achieves
state-of-the-art performance on audio-visual, 3D-image, audio-text, visual-text
retrieval, and 3D object classification tasks. More importantly, extensive
qualitative results further demonstrate the emergent semantic alignment between
the extended modalities (e.g., audio and 3D), which highlights the great
potential of modality extensibility.Comment: Our code is available at https://github.com/MCR-PEFT/Ex-MC
- …