9 research outputs found
Hyperbolic Geometry in Computer Vision: A Novel Framework for Convolutional Neural Networks
Real-world visual data exhibit intrinsic hierarchical structures that can be
represented effectively in hyperbolic spaces. Hyperbolic neural networks (HNNs)
are a promising approach for learning feature representations in such spaces.
However, current methods in computer vision rely on Euclidean backbones and
only project features to the hyperbolic space in the task heads, limiting their
ability to fully leverage the benefits of hyperbolic geometry. To address this,
we present HCNN, the first fully hyperbolic convolutional neural network (CNN)
designed for computer vision tasks. Based on the Lorentz model, we generalize
fundamental components of CNNs and propose novel formulations of the
convolutional layer, batch normalization, and multinomial logistic regression
(MLR). Experimentation on standard vision tasks demonstrates the effectiveness
of our HCNN framework and the Lorentz model in both hybrid and fully hyperbolic
settings. Overall, we aim to pave the way for future research in hyperbolic
computer vision by offering a new paradigm for interpreting and analyzing
visual data. Our code is publicly available at
https://github.com/kschwethelm/HyperbolicCV
Attention, Filling in The Gaps for Generalization in Routing Problems
Machine Learning (ML) methods have become a useful tool for tackling vehicle
routing problems, either in combination with popular heuristics or as
standalone models. However, current methods suffer from poor generalization
when tackling problems of different sizes or different distributions. As a
result, ML in vehicle routing has witnessed an expansion phase with new
methodologies being created for particular problem instances that become
infeasible at larger problem sizes.
This paper aims at encouraging the consolidation of the field through
understanding and improving current existing models, namely the attention model
by Kool et al. We identify two discrepancy categories for VRP generalization.
The first is based on the differences that are inherent to the problems
themselves, and the second relates to architectural weaknesses that limit the
model's ability to generalize. Our contribution becomes threefold: We first
target model discrepancies by adapting the Kool et al. method and its loss
function for Sparse Dynamic Attention based on the alpha-entmax activation. We
then target inherent differences through the use of a mixed instance training
method that has been shown to outperform single instance training in certain
scenarios. Finally, we introduce a framework for inference level data
augmentation that improves performance by leveraging the model's lack of
invariance to rotation and dilation changes.Comment: Accepted at ECML-PKDD 202