43 research outputs found

    Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework

    Full text link
    We propose a self-supervised method to learn feature representations from videos. A standard approach in traditional self-supervised methods uses positive-negative data pairs to train with contrastive learning strategy. In such a case, different modalities of the same video are treated as positives and video clips from a different video are treated as negatives. Because the spatio-temporal information is important for video representation, we extend the negative samples by introducing intra-negative samples, which are transformed from the same anchor video by breaking temporal relations in video clips. With the proposed Inter-Intra Contrastive (IIC) framework, we can train spatio-temporal convolutional networks to learn video representations. There are many flexible options in our IIC framework and we conduct experiments by using several different configurations. Evaluations are conducted on video retrieval and video recognition tasks using the learned video representation. Our proposed IIC outperforms current state-of-the-art results by a large margin, such as 16.7% and 9.5% points improvements in top-1 accuracy on UCF101 and HMDB51 datasets for video retrieval, respectively. For video recognition, improvements can also be obtained on these two benchmark datasets. Code is available at https://github.com/BestJuly/Inter-intra-video-contrastive-learning.Comment: Accepted by ACMMM 2020. Our project page is at https://bestjuly.github.io/Inter-intra-video-contrastive-learning

    MINOTAUR: Multi-task Video Grounding From Multimodal Queries

    Full text link
    Video understanding tasks take many forms, from action detection to visual query localization and spatio-temporal grounding of sentences. These tasks differ in the type of inputs (only video, or video-query pair where query is an image region or sentence) and outputs (temporal segments or spatio-temporal tubes). However, at their core they require the same fundamental understanding of the video, i.e., the actors and objects in it, their actions and interactions. So far these tasks have been tackled in isolation with individual, highly specialized architectures, which do not exploit the interplay between tasks. In contrast, in this paper, we present a single, unified model for tackling query-based video understanding in long-form videos. In particular, our model can address all three tasks of the Ego4D Episodic Memory benchmark which entail queries of three different forms: given an egocentric video and a visual, textual or activity query, the goal is to determine when and where the answer can be seen within the video. Our model design is inspired by recent query-based approaches to spatio-temporal grounding, and contains modality-specific query encoders and task-specific sliding window inference that allow multi-task training with diverse input modalities and different structured outputs. We exhaustively analyze relationships among the tasks and illustrate that cross-task learning leads to improved performance on each individual task, as well as the ability to generalize to unseen tasks, such as zero-shot spatial localization of language queries

    Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain, Active and Continual Few-Shot Learning

    Full text link
    Modern deep learning requires large-scale extensively labelled datasets for training. Few-shot learning aims to alleviate this issue by learning effectively from few labelled examples. In previously proposed few-shot visual classifiers, it is assumed that the feature manifold, where classifier decisions are made, has uncorrelated feature dimensions and uniform feature variance. In this work, we focus on addressing the limitations arising from this assumption by proposing a variance-sensitive class of models that operates in a low-label regime. The first method, Simple CNAPS, employs a hierarchically regularized Mahalanobis-distance based classifier combined with a state of the art neural adaptive feature extractor to achieve strong performance on Meta-Dataset, mini-ImageNet and tiered-ImageNet benchmarks. We further extend this approach to a transductive learning setting, proposing Transductive CNAPS. This transductive method combines a soft k-means parameter refinement procedure with a two-step task encoder to achieve improved test-time classification accuracy using unlabelled data. Transductive CNAPS achieves state of the art performance on Meta-Dataset. Finally, we explore the use of our methods (Simple and Transductive) for "out of the box" continual and active learning. Extensive experiments on large scale benchmarks illustrate robustness and versatility of this, relatively speaking, simple class of models. All trained model checkpoints and corresponding source codes have been made publicly available

    Evaluation of Interactive Rhythm Activities on the Engagement Level of Individuals with Memory Impairments

    Get PDF
    Alzheimer\u27s dementia can lead to a decreased quality of life in patients through the manifestation of inappropriate behavioral and psychological signs and symptoms. Music therapy has been shown to decrease agitation and disruptive behaviors in patients with dementia, although improvement in overall cognitive function was minimal. However, there is evidence showing an increase in grey matter in those who actively participate in music activities. Our goal in this study is to focus on how participation in rhythm-based activities affects quality of life.https://scholarworks.uvm.edu/comphp_gallery/1276/thumbnail.jp

    Use of human perivascular stem cells for bone regeneration

    Get PDF
    Human perivascular stem cells (PSCs) can be isolated in sufficient numbers from multiple tissues for purposes of skeletal tissue engineering(1-3). PSCs are a FACS-sorted population of 'pericytes' (CD146+CD34-CD45-) and 'adventitial cells' (CD146-CD34+CD45-), each of which we have previously reported to have properties of mesenchymal stem cells. PSCs, like MSCs, are able to undergo osteogenic differentiation, as well as secrete pro-osteogenic cytokines(1,2). In the present protocol, we demonstrate the osteogenicity of PSCs in several animal models including a muscle pouch implantation in SCID (severe combined immunodeficient) mice, a SCID mouse calvarial defect and a femoral segmental defect (FSD) in athymic rats. The thigh muscle pouch model is used to assess ectopic bone formation. Calvarial defects are centered on the parietal bone and are standardly 4 mm in diameter (critically sized)(8). FSDs are bicortical and are stabilized with a polyethylene bar and K-wires(4). The FSD described is also a critical size defect, which does not significantly heal on its own(4). In contrast, if stem cells or growth factors are added to the defect site, significant bone regeneration can be appreciated. The overall goal of PSC xenografting is to demonstrate the osteogenic capability of this cell type in both ectopic and orthotopic bone regeneration models
    corecore