Search CORE

221 research outputs found

Learning multi-view neighborhood preserving projections

Author: Lampert Christoph
Quadrianto Novi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

We address the problem of metric learning for multi-view data, namely the construction of embedding projections from data in different representations into a shared feature space, such that the Euclidean distance in this space provides a meaningful within-view as well as between-view similarity. Our motivation stems from the problem of cross-media retrieval tasks, where the availability of a joint Euclidean distance function is a prerequisite to allow fast, in particular hashing-based, nearest neighbor queries. We formulate an objective function that expresses the intuitive concept that matching samples are mapped closely together in the output space, whereas non-matching samples are pushed apart, no matter in which view they are available. The resulting optimization problem is not convex, but it can be decomposed explicitly into a convex and a concave part, thereby allowing efficient optimization using the convex-concave procedure. Experiments on an image retrieval task show that nearest-neighbor based cross-view retrieval is indeed possible, and the proposed technique improves the retrieval accuracy over baseline techniques

CiteSeerX

IST Austria: PubRep (Institute of Science and Technology)

Sussex Research Online

G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory

Author: Cao Meng
Cheng Xuxin
Li Hongxiang
Li Yaowei
Zhu Zhihong
Zou Yuexian
Publication venue
Publication date: 26/07/2023
Field of study

The recent video grounding works attempt to introduce vanilla contrastive learning into video grounding. However, we claim that this naive solution is suboptimal. Contrastive learning requires two key properties: (1) \emph{alignment} of features of similar samples, and (2) \emph{uniformity} of the induced distribution of the normalized features on the hypersphere. Due to two annoying issues in video grounding: (1) the co-existence of some visual entities in both ground truth and other moments, \ie semantic overlapping; (2) only a few moments in the video are annotated, \ie sparse annotation dilemma, vanilla contrastive learning is unable to model the correlations between temporally distant moments and learned inconsistent video representations. Both characteristics lead to vanilla contrastive learning being unsuitable for video grounding. In this paper, we introduce Geodesic and Game Localization (G2L), a semantically aligned and uniform video grounding framework via geodesic and game theory. We quantify the correlations among moments leveraging the geodesic distance that guides the model to learn the correct cross-modal representations. Furthermore, from the novel perspective of game theory, we propose semantic Shapley interaction based on geodesic distance sampling to learn fine-grained semantic alignment in similar moments. Experiments on three benchmarks demonstrate the effectiveness of our method.Comment: ICCV202

arXiv.org e-Print Archive

Recommended from our members

Utilizing Graph Structure for Machine Learning

Author: Dernbach Stefan
Publication venue: ScholarWorks@UMass Amherst
Publication date: 06/04/2021
Field of study

The information age has led to an explosion in the size and availability of data. This data often exhibits graph-structure that is either explicitly defined, as in the web of a social network, or is implicitly defined and can be determined by measuring similarity between objects. Utilizing this graph-structure allows for the design of machine learning algorithms that reflect not only the attributes of individual objects but their relationships to every other object in the domain as well. This thesis investigates three machine learning problems and proposes novel methods that leverage the graph-structure inherent in the tasks. Quantum walk neural networks are classical neural nets that use quantum random walks for classifying and regressing on graphs. Asymmetric directed node embeddings are another neural network architecture designed to embed the nodes of a directed graph into a vector space. Filtered manifold alignment is a novel two-step approach to domain adaptation

ScholarWorks@UMass Amherst