18 research outputs found
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning
We study unsupervised video representation learning that seeks to learn both
motion and appearance features from unlabeled video only, which can be reused
for downstream tasks such as action recognition. This task, however, is
extremely challenging due to 1) the highly complex spatial-temporal information
in videos; and 2) the lack of labeled data for training. Unlike the
representation learning for static images, it is difficult to construct a
suitable self-supervised task to well model both motion and appearance
features. More recently, several attempts have been made to learn video
representation through video playback speed prediction. However, it is
non-trivial to obtain precise speed labels for the videos. More critically, the
learnt models may tend to focus on motion pattern and thus may not learn
appearance features well. In this paper, we observe that the relative playback
speed is more consistent with motion pattern, and thus provide more effective
and stable supervision for representation learning. Therefore, we propose a new
way to perceive the playback speed and exploit the relative speed between two
video clips as labels. In this way, we are able to well perceive speed and
learn better motion features. Moreover, to ensure the learning of appearance
features, we further propose an appearance-focused task, where we enforce the
model to perceive the appearance difference between two video clips. We show
that optimizing the two tasks jointly consistently improves the performance on
two downstream tasks, namely action recognition and video retrieval.
Remarkably, for action recognition on UCF101 dataset, we achieve 93.7% accuracy
without the use of labeled data for pre-training, which outperforms the
ImageNet supervised pre-trained model. Code and pre-trained models can be found
at https://github.com/PeihaoChen/RSPNet.Comment: Accepted by AAAI-2021. Code and pre-trained models can be found at
https://github.com/PeihaoChen/RSPNe
A Survey on Metric Learning for Feature Vectors and Structured Data
The need for appropriate ways to measure the distance or similarity between
data is ubiquitous in machine learning, pattern recognition and data mining,
but handcrafting such good metrics for specific problems is generally
difficult. This has led to the emergence of metric learning, which aims at
automatically learning a metric from data and has attracted a lot of interest
in machine learning and related fields for the past ten years. This survey
paper proposes a systematic review of the metric learning literature,
highlighting the pros and cons of each approach. We pay particular attention to
Mahalanobis distance metric learning, a well-studied and successful framework,
but additionally present a wide range of methods that have recently emerged as
powerful alternatives, including nonlinear metric learning, similarity learning
and local metric learning. Recent trends and extensions, such as
semi-supervised metric learning, metric learning for histogram data and the
derivation of generalization guarantees, are also covered. Finally, this survey
addresses metric learning for structured data, in particular edit distance
learning, and attempts to give an overview of the remaining challenges in
metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved
presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new
method
Recommended from our members
The digital music lab: A big data infrastructure for digital musicology
In musicology and music research generally, the increasing availability of digital music, storage capacities, and computing power enable and require new and intelligent systems. In the transition from traditional to digital musicology, many techniques and tools have been developed for the analysis of individual pieces of music, but large-scale music data that are increasingly becoming available require research methods and systems that work on the collection-level and at scale. Although many relevant algorithms have been developed during the past 15 years of research in Music Information Retrieval, an integrated system that supports large-scale digital musicology research has so far been lacking. In the Digital Music Lab (DML) project, a collaboration among music librarians, musicologists, computer scientists, and human-computer interface specialists, the DML software system has been developed that fills this gap by providing intelligent large-scale music analysis with a user-friendly interactive interface supporting musicologists in their exploration and enquiry. The DML system empowers musicologists by addressing several challenges: distributed processing of audio and other music data, management of the data analysis process and results, remote analysis of data under copyright, logical inference on the extracted information and metadata, and visual web-based interfaces for exploring and querying the music collections. The DML system is scalable and based on SemanticWeb technology and integrates into Linked Data with the vision of a distributed system that enables music research across archives, libraries, and other providers of music data. A first DML system prototype has been set up in collaboration with the British Library and I Like Music Ltd. This system has been used to analyse a diverse corpus of currently 250,000 music tracks. In this article, we describe the DML system requirements, design, architecture, components, and available data sources, explaining their interaction. We report use cases and applications with initial evaluations of the proposed system