429 research outputs found
Hyperbolic Geometry in Computer Vision: A Novel Framework for Convolutional Neural Networks
Real-world visual data exhibit intrinsic hierarchical structures that can be
represented effectively in hyperbolic spaces. Hyperbolic neural networks (HNNs)
are a promising approach for learning feature representations in such spaces.
However, current methods in computer vision rely on Euclidean backbones and
only project features to the hyperbolic space in the task heads, limiting their
ability to fully leverage the benefits of hyperbolic geometry. To address this,
we present HCNN, the first fully hyperbolic convolutional neural network (CNN)
designed for computer vision tasks. Based on the Lorentz model, we generalize
fundamental components of CNNs and propose novel formulations of the
convolutional layer, batch normalization, and multinomial logistic regression
(MLR). Experimentation on standard vision tasks demonstrates the effectiveness
of our HCNN framework and the Lorentz model in both hybrid and fully hyperbolic
settings. Overall, we aim to pave the way for future research in hyperbolic
computer vision by offering a new paradigm for interpreting and analyzing
visual data. Our code is publicly available at
https://github.com/kschwethelm/HyperbolicCV
Learning Weakly Supervised Audio-Visual Violence Detection in Hyperbolic Space
In recent years, the task of weakly supervised audio-visual violence
detection has gained considerable attention. The goal of this task is to
identify violent segments within multimodal data based on video-level labels.
Despite advances in this field, traditional Euclidean neural networks, which
have been used in prior research, encounter difficulties in capturing highly
discriminative representations due to limitations of the feature space. To
overcome this, we propose HyperVD, a novel framework that learns snippet
embeddings in hyperbolic space to improve model discrimination. Our framework
comprises a detour fusion module for multimodal fusion, effectively alleviating
modality inconsistency between audio and visual signals. Additionally, we
contribute two branches of fully hyperbolic graph convolutional networks that
excavate feature similarities and temporal relationships among snippets in
hyperbolic space. By learning snippet representations in this space, the
framework effectively learns semantic discrepancies between violent and normal
events. Extensive experiments on the XD-Violence benchmark demonstrate that our
method outperforms state-of-the-art methods by a sizable margin.Comment: 8 pages, 5 figure
Federated Learning with Manifold Regularization and Normalized Update Reaggregation
Federated Learning (FL) is an emerging collaborative machine learning
framework where multiple clients train the global model without sharing their
own datasets. In FL, the model inconsistency caused by the local data
heterogeneity across clients results in the near-orthogonality of client
updates, which leads to the global update norm reduction and slows down the
convergence. Most previous works focus on eliminating the difference of
parameters (or gradients) between the local and global models, which may fail
to reflect the model inconsistency due to the complex structure of the machine
learning model and the Euclidean space's limitation in meaningful geometric
representations. In this paper, we propose FedMRUR by adopting the manifold
model fusion scheme and a new global optimizer to alleviate the negative
impacts. Concretely, FedMRUR adopts a hyperbolic graph manifold regularizer
enforcing the representations of the data in the local and global models are
close to each other in a low-dimensional subspace. Because the machine learning
model has the graph structure, the distance in hyperbolic space can reflect the
model bias better than the Euclidean distance. In this way, FedMRUR exploits
the manifold structures of the representations to significantly reduce the
model inconsistency. FedMRUR also aggregates the client updates norms as the
global update norm, which can appropriately enlarge each client's contribution
to the global update, thereby mitigating the norm reduction introduced by the
near-orthogonality of client updates. Furthermore, we theoretically prove that
our algorithm can achieve a linear speedup property for non-convex setting
under partial client participation.Experiments demonstrate that FedMRUR can
achieve a new state-of-the-art (SOTA) accuracy with less communication
Hyperbolic Image-Text Representations
Visual and linguistic concepts naturally organize themselves in a hierarchy,
where a textual concept ``dog'' entails all images that contain dogs. Despite
being intuitive, current large-scale vision and language models such as CLIP do
not explicitly capture such hierarchy. We propose MERU, a contrastive model
that yields hyperbolic representations of images and text. Hyperbolic spaces
have suitable geometric properties to embed tree-like data, so MERU can better
capture the underlying hierarchy in image-text data. Our results show that MERU
learns a highly interpretable representation space while being competitive with
CLIP's performance on multi-modal tasks like image classification and
image-text retrieval.Comment: Technical repor
Geometric Interaction Augmented Graph Collaborative Filtering
Graph-based collaborative filtering is capable of capturing the essential and
abundant collaborative signals from the high-order interactions, and thus
received increasingly research interests. Conventionally, the embeddings of
users and items are defined in the Euclidean spaces, along with the propagation
on the interaction graphs. Meanwhile, recent works point out that the
high-order interactions naturally form up the tree-likeness structures, which
the hyperbolic models thrive on. However, the interaction graphs inherently
exhibit the hybrid and nested geometric characteristics, while the existing
single geometry-based models are inadequate to fully capture such sophisticated
topological patterns. In this paper, we propose to model the user-item
interactions in a hybrid geometric space, in which the merits of Euclidean and
hyperbolic spaces are simultaneously enjoyed to learn expressive
representations. Experimental results on public datasets validate the
effectiveness of our proposal
Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings
We consider the task of inferring is-a relationships from large text corpora.
For this purpose, we propose a new method combining hyperbolic embeddings and
Hearst patterns. This approach allows us to set appropriate constraints for
inferring concept hierarchies from distributional contexts while also being
able to predict missing is-a relationships and to correct wrong extractions.
Moreover -- and in contrast with other methods -- the hierarchical nature of
hyperbolic space allows us to learn highly efficient representations and to
improve the taxonomic consistency of the inferred hierarchies. Experimentally,
we show that our approach achieves state-of-the-art performance on several
commonly-used benchmarks
- …