438 research outputs found
Kripke Models for Classical Logic
We introduce a notion of Kripke model for classical logic for which we
constructively prove soundness and cut-free completeness. We discuss the
novelty of the notion and its potential applications
The First-Order Hypothetical Logic of Proofs
The Propositional Logic of Proofs (LP) is a modal logic in which the modality â–¡A is revisited as [​[t]​]​A , t being an expression that bears witness to the validity of A . It enjoys arithmetical soundness and completeness, can realize all S4 theorems and is capable of reflecting its own proofs ( ⊢A implies ⊢[​[t]​]A , for some t ). A presentation of first-order LP has recently been proposed, FOLP, which enjoys arithmetical soundness and has an exact provability semantics. A key notion in this presentation is how free variables are dealt with in a formula of the form [​[t]​]​A(i) . We revisit this notion in the setting of a Natural Deduction presentation and propose a Curry–Howard correspondence for FOLP. A term assignment is provided and a proof of strong normalization is given.Fil: Steren, Gabriela. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Bonelli, Eduardo Augusto. Universidad Nacional de Quilmes. Departamento de Ciencia y TecnologÃa; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas; Argentin
HOL(y)Hammer: Online ATP Service for HOL Light
HOL(y)Hammer is an online AI/ATP service for formal (computer-understandable)
mathematics encoded in the HOL Light system. The service allows its users to
upload and automatically process an arbitrary formal development (project)
based on HOL Light, and to attack arbitrary conjectures that use the concepts
defined in some of the uploaded projects. For that, the service uses several
automated reasoning systems combined with several premise selection methods
trained on all the project proofs. The projects that are readily available on
the server for such query answering include the recent versions of the
Flyspeck, Multivariate Analysis and Complex Analysis libraries. The service
runs on a 48-CPU server, currently employing in parallel for each task 7 AI/ATP
combinations and 4 decision procedures that contribute to its overall
performance. The system is also available for local installation by interested
users, who can customize it for their own proof development. An Emacs interface
allowing parallel asynchronous queries to the service is also provided. The
overall structure of the service is outlined, problems that arise and their
solutions are discussed, and an initial account of using the system is given
Time-dependent current density functional theory on a lattice
A rigorous formulation of time-dependent current density functional theory
(TDCDFT) on a lattice is presented. The density-to-potential mapping and the
-representability problems are reduced to a solution of a certain
nonlinear lattice Schr\"odinger equation, to which the standard existence and
uniqueness results for nonliner differential equations are applicable. For two
versions of the lattice TDCDFT we prove that any continuous in time current
density is locally -representable (both interacting and
noninteracting), provided in the initial state the local kinetic energy is
nonzero everywhere. In most cases of physical interest the -representability should also hold globally in time. These results put the
application of TDCDFT to any lattice model on a firm ground, and open a way for
studying exact properties of exchange correlation potentials.Comment: revtex4, 9 page
What shapes the loss landscape of self-supervised learning?
Prevention of complete and dimensional collapse of representations has
recently become a design principle for self-supervised learning (SSL). However,
questions remain in our theoretical understanding: When do those collapses
occur? What are the mechanisms and causes? We answer these questions by
deriving and thoroughly analyzing an analytically tractable theory of SSL loss
landscapes. In this theory, we identify the causes of the dimensional collapse
and study the effect of normalization and bias. Finally, we leverage the
interpretability afforded by the analytical theory to understand how
dimensional collapse can be beneficial and what affects the robustness of SSL
against data imbalance.Comment: Published at ICLR 202
A modern look at the relationship between sharpness and generalization
Sharpness of minima is a promising quantity that can correlate with
generalization in deep networks and, when optimized during training, can
improve generalization. However, standard sharpness is not invariant under
reparametrizations of neural networks, and, to fix this,
reparametrization-invariant sharpness definitions have been proposed, most
prominently adaptive sharpness (Kwon et al., 2021). But does it really capture
generalization in modern practical settings? We comprehensively explore this
question in a detailed study of various definitions of adaptive sharpness in
settings ranging from training from scratch on ImageNet and CIFAR-10 to
fine-tuning CLIP on ImageNet and BERT on MNLI. We focus mostly on transformers
for which little is known in terms of sharpness despite their widespread usage.
Overall, we observe that sharpness does not correlate well with generalization
but rather with some training parameters like the learning rate that can be
positively or negatively correlated with generalization depending on the setup.
Interestingly, in multiple cases, we observe a consistent negative correlation
of sharpness with out-of-distribution error implying that sharper minima can
generalize better. Finally, we illustrate on a simple model that the right
sharpness measure is highly data-dependent, and that we do not understand well
this aspect for realistic data distributions. The code of our experiments is
available at https://github.com/tml-epfl/sharpness-vs-generalization
SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities
Many approaches in machine learning rely on a weighted graph to encode the
similarities between samples in a dataset. Entropic affinities (EAs), which are
notably used in the popular Dimensionality Reduction (DR) algorithm t-SNE, are
particular instances of such graphs. To ensure robustness to heterogeneous
sampling densities, EAs assign a kernel bandwidth parameter to every sample in
such a way that the entropy of each row in the affinity matrix is kept constant
at a specific value, whose exponential is known as perplexity. EAs are
inherently asymmetric and row-wise stochastic, but they are used in DR
approaches after undergoing heuristic symmetrization methods that violate both
the row-wise constant entropy and stochasticity properties. In this work, we
uncover a novel characterization of EA as an optimal transport problem,
allowing a natural symmetrization that can be computed efficiently using dual
ascent. The corresponding novel affinity matrix derives advantages from
symmetric doubly stochastic normalization in terms of clustering performance,
while also effectively controlling the entropy of each row thus making it
particularly robust to varying noise levels. Following, we present a new DR
algorithm, SNEkhorn, that leverages this new affinity matrix. We show its clear
superiority to state-of-the-art approaches with several indicators on both
synthetic and real-world datasets
- …