234,713 research outputs found
Time-Contrastive Networks: Self-Supervised Learning from Video
We propose a self-supervised approach for learning representations and
robotic behaviors entirely from unlabeled videos recorded from multiple
viewpoints, and study how this representation can be used in two robotic
imitation settings: imitating object interactions from videos of humans, and
imitating human poses. Imitation of human behavior requires a
viewpoint-invariant representation that captures the relationships between
end-effectors (hands or robot grippers) and the environment, object attributes,
and body pose. We train our representations using a metric learning loss, where
multiple simultaneous viewpoints of the same observation are attracted in the
embedding space, while being repelled from temporal neighbors which are often
visually similar but functionally different. In other words, the model
simultaneously learns to recognize what is common between different-looking
images, and what is different between similar-looking images. This signal
causes our model to discover attributes that do not change across viewpoint,
but do change across time, while ignoring nuisance variables such as
occlusions, motion blur, lighting and background. We demonstrate that this
representation can be used by a robot to directly mimic human poses without an
explicit correspondence, and that it can be used as a reward function within a
reinforcement learning algorithm. While representations are learned from an
unlabeled collection of task-related videos, robot behaviors such as pouring
are learned by watching a single 3rd-person demonstration by a human. Reward
functions obtained by following the human demonstrations under the learned
representation enable efficient reinforcement learning that is practical for
real-world robotic systems. Video results, open-source code and dataset are
available at https://sermanet.github.io/imitat
Ranking relations using analogies in biological and information networks
Analogical reasoning depends fundamentally on the ability to learn and
generalize about relations between objects. We develop an approach to
relational learning which, given a set of pairs of objects
,
measures how well other pairs A:B fit in with the set . Our work
addresses the following question: is the relation between objects A and B
analogous to those relations found in ? Such questions are
particularly relevant in information retrieval, where an investigator might
want to search for analogous pairs of objects that match the query set of
interest. There are many ways in which objects can be related, making the task
of measuring analogies very challenging. Our approach combines a similarity
measure on function spaces with Bayesian analysis to produce a ranking. It
requires data containing features of the objects of interest and a link matrix
specifying which relationships exist; no further attributes of such
relationships are necessary. We illustrate the potential of our method on text
analysis and information networks. An application on discovering functional
interactions between pairs of proteins is discussed in detail, where we show
that our approach can work in practice even if a small set of protein pairs is
provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A Comparative Study of Pairwise Learning Methods based on Kernel Ridge Regression
Many machine learning problems can be formulated as predicting labels for a
pair of objects. Problems of that kind are often referred to as pairwise
learning, dyadic prediction or network inference problems. During the last
decade kernel methods have played a dominant role in pairwise learning. They
still obtain a state-of-the-art predictive performance, but a theoretical
analysis of their behavior has been underexplored in the machine learning
literature.
In this work we review and unify existing kernel-based algorithms that are
commonly used in different pairwise learning settings, ranging from matrix
filtering to zero-shot learning. To this end, we focus on closed-form efficient
instantiations of Kronecker kernel ridge regression. We show that independent
task kernel ridge regression, two-step kernel ridge regression and a linear
matrix filter arise naturally as a special case of Kronecker kernel ridge
regression, implying that all these methods implicitly minimize a squared loss.
In addition, we analyze universality, consistency and spectral filtering
properties. Our theoretical results provide valuable insights in assessing the
advantages and limitations of existing pairwise learning methods.Comment: arXiv admin note: text overlap with arXiv:1606.0427
Is what you see what you get? representations, metaphors and tools in mathematics didactics
This paper is exploratory in character. The aim is to investigate ways in which it is possible to use the theoretical concepts of representations, tools and metaphors to try to understand what learners of mathematics ‘see’ during classroom interactions (in their widest sense) and what they might get from such interactions. Through an analysis of a brief classroom episode, the suggestion is made that what learners see may not be the same as what they get. From each of several theoretical perspectives utilised in this paper, what learners ‘get’ appears to be something extra. According to our analysis, this something ‘extra’ is likely to depend on the form of technology being used and the representations and metaphors that are available to both teacher and learner
Attend and Interact: Higher-Order Object Interactions for Video Understanding
Human actions often involve complex interactions across several inter-related
objects in the scene. However, existing approaches to fine-grained video
understanding or visual relationship detection often rely on single object
representation or pairwise object relationships. Furthermore, learning
interactions across multiple objects in hundreds of frames for video is
computationally infeasible and performance may suffer since a large
combinatorial space has to be modeled. In this paper, we propose to efficiently
learn higher-order interactions between arbitrary subgroups of objects for
fine-grained video understanding. We demonstrate that modeling object
interactions significantly improves accuracy for both action recognition and
video captioning, while saving more than 3-times the computation over
traditional pairwise relationships. The proposed method is validated on two
large-scale datasets: Kinetics and ActivityNet Captions. Our SINet and
SINet-Caption achieve state-of-the-art performances on both datasets even
though the videos are sampled at a maximum of 1 FPS. To the best of our
knowledge, this is the first work modeling object interactions on open domain
large-scale video datasets, and we additionally model higher-order object
interactions which improves the performance with low computational costs.Comment: CVPR 201
Modeling Relational Data via Latent Factor Blockmodel
In this paper we address the problem of modeling relational data, which
appear in many applications such as social network analysis, recommender
systems and bioinformatics. Previous studies either consider latent feature
based models but disregarding local structure in the network, or focus
exclusively on capturing local structure of objects based on latent blockmodels
without coupling with latent characteristics of objects. To combine the
benefits of the previous work, we propose a novel model that can simultaneously
incorporate the effect of latent features and covariates if any, as well as the
effect of latent structure that may exist in the data. To achieve this, we
model the relation graph as a function of both latent feature factors and
latent cluster memberships of objects to collectively discover globally
predictive intrinsic properties of objects and capture latent block structure
in the network to improve prediction performance. We also develop an
optimization transfer algorithm based on the generalized EM-style strategy to
learn the latent factors. We prove the efficacy of our proposed model through
the link prediction task and cluster analysis task, and extensive experiments
on the synthetic data and several real world datasets suggest that our proposed
LFBM model outperforms the other state of the art approaches in the evaluated
tasks.Comment: 10 pages, 12 figure
- …