43 research outputs found
Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework
We propose a self-supervised method to learn feature representations from
videos. A standard approach in traditional self-supervised methods uses
positive-negative data pairs to train with contrastive learning strategy. In
such a case, different modalities of the same video are treated as positives
and video clips from a different video are treated as negatives. Because the
spatio-temporal information is important for video representation, we extend
the negative samples by introducing intra-negative samples, which are
transformed from the same anchor video by breaking temporal relations in video
clips. With the proposed Inter-Intra Contrastive (IIC) framework, we can train
spatio-temporal convolutional networks to learn video representations. There
are many flexible options in our IIC framework and we conduct experiments by
using several different configurations. Evaluations are conducted on video
retrieval and video recognition tasks using the learned video representation.
Our proposed IIC outperforms current state-of-the-art results by a large
margin, such as 16.7% and 9.5% points improvements in top-1 accuracy on UCF101
and HMDB51 datasets for video retrieval, respectively. For video recognition,
improvements can also be obtained on these two benchmark datasets. Code is
available at
https://github.com/BestJuly/Inter-intra-video-contrastive-learning.Comment: Accepted by ACMMM 2020. Our project page is at
https://bestjuly.github.io/Inter-intra-video-contrastive-learning
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Video understanding tasks take many forms, from action detection to visual
query localization and spatio-temporal grounding of sentences. These tasks
differ in the type of inputs (only video, or video-query pair where query is an
image region or sentence) and outputs (temporal segments or spatio-temporal
tubes). However, at their core they require the same fundamental understanding
of the video, i.e., the actors and objects in it, their actions and
interactions. So far these tasks have been tackled in isolation with
individual, highly specialized architectures, which do not exploit the
interplay between tasks. In contrast, in this paper, we present a single,
unified model for tackling query-based video understanding in long-form videos.
In particular, our model can address all three tasks of the Ego4D Episodic
Memory benchmark which entail queries of three different forms: given an
egocentric video and a visual, textual or activity query, the goal is to
determine when and where the answer can be seen within the video. Our model
design is inspired by recent query-based approaches to spatio-temporal
grounding, and contains modality-specific query encoders and task-specific
sliding window inference that allow multi-task training with diverse input
modalities and different structured outputs. We exhaustively analyze
relationships among the tasks and illustrate that cross-task learning leads to
improved performance on each individual task, as well as the ability to
generalize to unseen tasks, such as zero-shot spatial localization of language
queries
Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain, Active and Continual Few-Shot Learning
Modern deep learning requires large-scale extensively labelled datasets for
training. Few-shot learning aims to alleviate this issue by learning
effectively from few labelled examples. In previously proposed few-shot visual
classifiers, it is assumed that the feature manifold, where classifier
decisions are made, has uncorrelated feature dimensions and uniform feature
variance. In this work, we focus on addressing the limitations arising from
this assumption by proposing a variance-sensitive class of models that operates
in a low-label regime. The first method, Simple CNAPS, employs a hierarchically
regularized Mahalanobis-distance based classifier combined with a state of the
art neural adaptive feature extractor to achieve strong performance on
Meta-Dataset, mini-ImageNet and tiered-ImageNet benchmarks. We further extend
this approach to a transductive learning setting, proposing Transductive CNAPS.
This transductive method combines a soft k-means parameter refinement procedure
with a two-step task encoder to achieve improved test-time classification
accuracy using unlabelled data. Transductive CNAPS achieves state of the art
performance on Meta-Dataset. Finally, we explore the use of our methods (Simple
and Transductive) for "out of the box" continual and active learning. Extensive
experiments on large scale benchmarks illustrate robustness and versatility of
this, relatively speaking, simple class of models. All trained model
checkpoints and corresponding source codes have been made publicly available
Evaluation of Interactive Rhythm Activities on the Engagement Level of Individuals with Memory Impairments
Alzheimer\u27s dementia can lead to a decreased quality of life in patients through the manifestation of inappropriate behavioral and psychological signs and symptoms. Music therapy has been shown to decrease agitation and disruptive behaviors in patients with dementia, although improvement in overall cognitive function was minimal. However, there is evidence showing an increase in grey matter in those who actively participate in music activities. Our goal in this study is to focus on how participation in rhythm-based activities affects quality of life.https://scholarworks.uvm.edu/comphp_gallery/1276/thumbnail.jp
Use of human perivascular stem cells for bone regeneration
Human perivascular stem cells (PSCs) can be isolated in sufficient numbers from multiple tissues for purposes of skeletal tissue engineering(1-3). PSCs are a FACS-sorted population of 'pericytes' (CD146+CD34-CD45-) and 'adventitial cells' (CD146-CD34+CD45-), each of which we have previously reported to have properties of mesenchymal stem cells. PSCs, like MSCs, are able to undergo osteogenic differentiation, as well as secrete pro-osteogenic cytokines(1,2). In the present protocol, we demonstrate the osteogenicity of PSCs in several animal models including a muscle pouch implantation in SCID (severe combined immunodeficient) mice, a SCID mouse calvarial defect and a femoral segmental defect (FSD) in athymic rats. The thigh muscle pouch model is used to assess ectopic bone formation. Calvarial defects are centered on the parietal bone and are standardly 4 mm in diameter (critically sized)(8). FSDs are bicortical and are stabilized with a polyethylene bar and K-wires(4). The FSD described is also a critical size defect, which does not significantly heal on its own(4). In contrast, if stem cells or growth factors are added to the defect site, significant bone regeneration can be appreciated. The overall goal of PSC xenografting is to demonstrate the osteogenic capability of this cell type in both ectopic and orthotopic bone regeneration models