564 research outputs found
Towards Data-centric Graph Machine Learning: Review and Outlook
Data-centric AI, with its primary focus on the collection, management, and
utilization of data to drive AI models and applications, has attracted
increasing attention in recent years. In this article, we conduct an in-depth
and comprehensive review, offering a forward-looking outlook on the current
efforts in data-centric AI pertaining to graph data-the fundamental data
structure for representing and capturing intricate dependencies among massive
and diverse real-life entities. We introduce a systematic framework,
Data-centric Graph Machine Learning (DC-GML), that encompasses all stages of
the graph data lifecycle, including graph data collection, exploration,
improvement, exploitation, and maintenance. A thorough taxonomy of each stage
is presented to answer three critical graph-centric questions: (1) how to
enhance graph data availability and quality; (2) how to learn from graph data
with limited-availability and low-quality; (3) how to build graph MLOps systems
from the graph data-centric view. Lastly, we pinpoint the future prospects of
the DC-GML domain, providing insights to navigate its advancements and
applications.Comment: 42 pages, 9 figure
Engineering flexible machine learning systems by traversing functionally-invariant paths
Transformers have emerged as the state of the art neural network architecture
for natural language processing and computer vision. In the foundation model
paradigm, large transformer models (BERT, GPT3/4, Bloom, ViT) are pre-trained
on self-supervised tasks such as word or image masking, and then, adapted
through fine-tuning for downstream user applications including instruction
following and Question Answering. While many approaches have been developed for
model fine-tuning including low-rank weight update strategies (eg. LoRA),
underlying mathematical principles that enable network adaptation without
knowledge loss remain poorly understood. Here, we introduce a differential
geometry framework, functionally invariant paths (FIP), that provides flexible
and continuous adaptation of neural networks for a range of machine learning
goals and network sparsification objectives. We conceptualize the weight space
of a neural network as a curved Riemannian manifold equipped with a metric
tensor whose spectrum defines low rank subspaces in weight space that
accommodate network adaptation without loss of prior knowledge. We formalize
adaptation as movement along a geodesic path in weight space while searching
for networks that accommodate secondary objectives. With modest computational
resources, the FIP algorithm achieves comparable to state of the art
performance on continual learning and sparsification tasks for language models
(BERT), vision transformers (ViT, DeIT), and the CNNs. Broadly, we
conceptualize a neural network as a mathematical object that can be iteratively
transformed into distinct configurations by the path-sampling algorithm to
define a sub-manifold of weight space that can be harnessed to achieve user
goals.Comment: 22 page
Semi-Supervised Learning of Cartesian Factors
The existence of place cells (PCs), grid cells (GCs), border cells (BCs), and head direction cells (HCs) as well as the dependencies between them have been enigmatic. We make an effort to explain their nature by introducing the concept of Cartesian Factors. These factors have specific properties: (i) they assume and complement each other, like direction and position and (ii) they have localized discrete representations with predictive attractors enabling implicit metric-like computations. In our model, HCs make the distributed and local representation of direction. Predictive attractor dynamics on that network forms the Cartesian Factor "direction." We embed these HCs and idiothetic visual information into a semi-supervised sparse autoencoding comparator structure that compresses its inputs and learns PCs, the distributed local and direction independent (allothetic) representation of the Cartesian Factor of global space. We use a supervised, information compressing predictive algorithm and form direction sensitive (oriented) GCs from the learned PCs by means of an attractor-like algorithm. Since the algorithm can continue the grid structure beyond the region of the PCs, i.e.,beyond its learning domain, thus the GCs and the PCs together form our metric-like Cartesian Factors of space. We also stipulate that the same algorithm can produce BCs. Our algorithm applies (a) a bag representation that models the "what system" and (b) magnitude ordered place cell activities that model either the integrate-and-fire mechanism, or theta phase precession, or both. We relate the components of the algorithm to the entorhinal-hippocampal complex and to its working. The algorithm requires both spatial and lifetime sparsification that may gain support from the two-stage memory formation of this complex
Unsupervised Embedding Quality Evaluation
Unsupervised learning has recently significantly gained in popularity,
especially with deep learning-based approaches. Despite numerous successes and
approaching supervised-level performance on a variety of academic benchmarks,
it is still hard to train and evaluate SSL models in practice due to the
unsupervised nature of the problem. Even with networks trained in a supervised
fashion, it is often unclear whether they will perform well when transferred to
another domain.
Past works are generally limited to assessing the amount of information
contained in embeddings, which is most relevant for self-supervised learning of
deep neural networks. This works chooses to follow a different approach: can we
quantify how easy it is to linearly separate the data in a stable way? We
survey the literature and uncover three methods that could be potentially used
for evaluating quality of representations. We also introduce one novel method
based on recent advances in understanding the high-dimensional geometric
structure of self-supervised learning.
We conduct extensive experiments and study the properties of these metrics
and ones introduced in the previous work. Our results suggest that while there
is no free lunch, there are metrics that can robustly estimate embedding
quality in an unsupervised way.Comment: As appeared at the 2nd Annual Workshop on Topology, Algebra, and
Geometry in Machine Learning (TAG-ML) at the 40th International Conference on
Machine Learning (ICML), Honolulu, Hawaii, USA. 202
Unsupervised Visual and Textual Information Fusion in Multimedia Retrieval - A Graph-based Point of View
Multimedia collections are more than ever growing in size and diversity.
Effective multimedia retrieval systems are thus critical to access these
datasets from the end-user perspective and in a scalable way. We are interested
in repositories of image/text multimedia objects and we study multimodal
information fusion techniques in the context of content based multimedia
information retrieval. We focus on graph based methods which have proven to
provide state-of-the-art performances. We particularly examine two of such
methods : cross-media similarities and random walk based scores. From a
theoretical viewpoint, we propose a unifying graph based framework which
encompasses the two aforementioned approaches. Our proposal allows us to
highlight the core features one should consider when using a graph based
technique for the combination of visual and textual information. We compare
cross-media and random walk based results using three different real-world
datasets. From a practical standpoint, our extended empirical analysis allow us
to provide insights and guidelines about the use of graph based methods for
multimodal information fusion in content based multimedia information
retrieval.Comment: An extended version of the paper: Visual and Textual Information
Fusion in Multimedia Retrieval using Semantic Filtering and Graph based
Methods, by J. Ah-Pine, G. Csurka and S. Clinchant, submitted to ACM
Transactions on Information System
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Simultaneous Localization and Mapping (SLAM)consists in the concurrent
construction of a model of the environment (the map), and the estimation of the
state of the robot moving within it. The SLAM community has made astonishing
progress over the last 30 years, enabling large-scale real-world applications,
and witnessing a steady transition of this technology to industry. We survey
the current state of SLAM. We start by presenting what is now the de-facto
standard formulation for SLAM. We then review related work, covering a broad
set of topics including robustness and scalability in long-term mapping, metric
and semantic representations for mapping, theoretical performance guarantees,
active SLAM and exploration, and other new frontiers. This paper simultaneously
serves as a position paper and tutorial to those who are users of SLAM. By
looking at the published research with a critical eye, we delineate open
challenges and new research issues, that still deserve careful scientific
investigation. The paper also contains the authors' take on two questions that
often animate discussions during robotics conferences: Do robots need SLAM? and
Is SLAM solved
- …