498 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss
We consider a deep matrix factorization model of covariance matrices trained
with the Bures-Wasserstein distance. While recent works have made important
advances in the study of the optimization problem for overparametrized low-rank
matrix approximation, much emphasis has been placed on discriminative settings
and the square loss. In contrast, our model considers another interesting type
of loss and connects with the generative setting. We characterize the critical
points and minimizers of the Bures-Wasserstein distance over the space of
rank-bounded matrices. For low-rank matrices the Hessian of this loss can
theoretically blow up, which creates challenges to analyze convergence of
optimizaton methods. We establish convergence results for gradient flow using a
smooth perturbative version of the loss and convergence results for finite step
size gradient descent under certain assumptions on the initial weights.Comment: 35 pages, 1 figure, accepted at ICML 202
Translating Islamic Law: the postcolonial quest for minority representation
This research sets out to investigate how culture-specific or signature concepts are rendered in English-language discourse on Islamic, or âshariÊżaâ law, which has Arabic roots. A large body of literature has investigated Islamic law from a technical perspective. However, from the perspective of linguistics and translation studies, little attention has been paid to the lexicon that makes up this specialised discourse. Much of the commentary has so far been prescriptive, with limited empirical evidence. This thesis aims to bridge this gap by exploring how âculturaleseâ (i.e., ostensive cultural discourse) travels through language, as evidenced in the self-built Islamic Law Corpus (ILC), a 9-million-word monolingual English corpus, covering diverse genres on Islamic finance and family law.
Using a mixed methods design, the study first quantifies the different linguistic strategies used to render shariÊża-based concepts in English, in order to explore âtranslationâ norms based on linguistic frequency in the corpus. This quantitative analysis employs two models: profile-based correspondence analysis, which considers the probability of lexical variation in expressing a conceptual category, and logistic regression (using MATLAB programming software), which measures the influence of the explanatory variables âgenreâ, âlegal functionâ and âsubject fieldâ on the choice between an Arabic loanword and an endogenous English lexeme, i.e., a close English equivalent. The findings are then interpreted qualitatively in the light of postcolonial translation agendas, which aim to preserve intangible cultural heritage and promote the representation of minoritised groups.
The research finds that the English-language discourse on Islamic law is characterised by linguistic borrowing and glossing, implying an ideologically driven variety of English that can be usefully labelled as a kind of âIslamgishâ (blending âIslamicâ and âEnglishâ) aimed at retaining symbols of linguistic hybridity. The regression analysis confirms the influence of the above-mentioned contextual factors on the use of an Arabic loanword versus English alternatives
Linear CNNs Discover the Statistical Structure of the Dataset Using Only the Most Dominant Frequencies
We here present a stepping stone towards a deeper understanding of
convolutional neural networks (CNNs) in the form of a theory of learning in
linear CNNs. Through analyzing the gradient descent equations, we discover that
the evolution of the network during training is determined by the interplay
between the dataset structure and the convolutional network structure. We show
that linear CNNs discover the statistical structure of the dataset with
non-linear, ordered, stage-like transitions, and that the speed of discovery
changes depending on the relationship between the dataset and the convolutional
network structure. Moreover, we find that this interplay lies at the heart of
what we call the ``dominant frequency bias'', where linear CNNs arrive at these
discoveries using only the dominant frequencies of the different structural
parts present in the dataset. We furthermore provide experiments that show how
our theory relates to deep, non-linear CNNs used in practice. Our findings shed
new light on the inner working of CNNs, and can help explain their shortcut
learning and their tendency to rely on texture instead of shape
Ample groupoid homology and \'etale correspondences
We show that \'etale correspondences between ample groupoids induce
homomorphisms of homology groups. To complement this we explore the module
categories of ample groupoids. We construct an induction-restriction adjunction
for subgroupoids, which generates a procedure for building resolutions of
arbitrary groupoid modules. These resolutions can be used to work with the Tor
picture of groupoid homology, enabling explicit descriptions of the maps in
homology induced by \'etale correspondences.Comment: 16 page
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Nash's bargaining problem and the scale-invariant Hirsch citation index
A number of citation indices have been proposed for measuring and ranking the
research publication records of scholars. Some of the best known indices, such
as those proposed by Hirsch and Woeginger, are designed to reward most highly
those records that strike some balance between productivity (number of papers
published), and impact (frequency with which those papers are cited). A large
number of rarely cited publications will not score well, nor will a very small
number of heavily cited papers. We discuss three new citation indices, one of
which was independently proposed in \cite{FHLB}. Each rests on the notion of
scale invariance, fundamental to John Nash's solution of the two-person
bargaining problem. Our main focus is on one of these -- a scale invariant
version of the Hirsch index. We argue that it has advantages over the original;
it produces fairer rankings within subdisciplines, is more decisive
(discriminates more finely, yielding fewer ties) and more dynamic (growing over
time via more frequent, smaller increments), and exhibits enhanced centrality
and tail balancedness. Simulations suggest that scale invariance improves
robustness under Poisson noise, with increased decisiveness having no cost in
terms of the number of ``accidental" reversals, wherein random irregularities
cause researcher A to receive a lower index value than B, although A's
productivity and impact are both slightly higher than B's. Moreover, we provide
an axiomatic characterization of the scale invariant Hirsch index, via axioms
that bear a close relationship, in discrete analogue, to those used by Nash in
\cite{Nas50}. This argues for the mathematical naturality of the new index.
An earlier version was presented at the 5th World Congress of the Game Theory
Society, Maastricht, Netherlands in 2016.Comment: 44 pages, 8 figure
Inductive Bias in Machine Learning
Induktive Verzerrung beschreibt die PrĂ€ferenz fĂŒr Lösungen, welche ein Algorithmus fĂŒr maschinelles Lernen hat, bevor er Daten sieht.
Sie ist notwendiger Bestandteil fĂŒr das Ziel des maschinellen Lernens, nĂ€mlich von einer Menge an Beispielen auf ungesehene Datenpunkte zu verallgemeinern.
In der Praxis wird die induktive Verzerrung jedoch oft nicht explizit spezifiziert, was theoretisches VerstÀndnis verhindert und das Vertrauen in maschinelles Lernen untergrÀbt.
Am deutlichsten wird dieses Problem am zeitgenössischen Beispiel von deep learning, das zwar in vielen Anwendungen erfolgreich ist, aber auf einer Vielzahl schlecht verstandener Techniken und Heuristiken beruht.
Ziel dieser Dissertation ist es, die versteckten induktiven Verzerrungen von Algorithmen des maschinellen Lernens aufzudecken.
Im ersten Teil der Dissertation decken wir die induktive Verzerrung von NetGAN auf, einem komplexen generativen Graphenmodell, das scheinbar keine PrÀferenzen hat.
Wir stellen fest, dass die Ursache der Generalisierung nicht in der GAN-Architektur liegt, sondern in einer unscheinbaren Approximation mit niedrigem Rang.
Wir nutzen diese Erkenntnis, um NetGAN von allen unnötigen Teilen, einschlieĂlich des GAN, zu befreien und eine stark vereinfachte Reformulierung zu erhalten.
Als NĂ€chstes prĂ€sentieren wir einen generischen Algorithmus, der die versteckte induktive Verzerrung in der approximativen Bayesschen Inferenz enthĂŒllt.
WĂ€hrend die induktive Verzerrung bei der Bayesschen Inferenz vollstĂ€ndig durch den Prior beschrieben wird, greifen reale Anwendungen oft auf approximative Techniken zurĂŒck, die unkontrollierbare Fehler machen können.
Indem wir das Problem in Form von inkompatiblen bedingten Verteilungen reformulieren, kommen wir zu einem generischen Algorithmus, der auf Pseudo-Gibbs-Sampling basiert und die Ănderung der induktiven Verzerrung auf eine Ănderung des Priors zurĂŒckfĂŒhrt.
Der letzte Teil der Dissertation betrifft eine hĂ€ufige induktive Verzerrung beim kausalen Lernen, die Annahme unabhĂ€ngiger kausaler Mechanismen. Unter dieser Annahme betrachten wir SchĂ€tzer fĂŒr die StĂ€rke von Störfaktoren, die die Generalisierung von der Beobachtungsverteilung auf das zugrunde liegende kausale Modell bestimmt. Wir zeigen, dass ein bestehender SchĂ€tzer im Allgemeinen inkonsistent ist und prĂ€sentieren einen konsistenten SchĂ€tzer mit Werkzeugen aus der Theorie von Zufallsmatrizen.Inductive bias describes the preference for solutions that a machine learning algorithm holds before seeing any data.
It is a necessary ingredient for the goal of machine learning, which is to generalize from a set of examples to unseen data points.
Yet, the inductive bias of learning algorithms is often not specified explicitly in practice, which prevents a theoretical understanding and undermines trust in machine learning.
This issue is most prominently visible in the contemporary case of deep learning, which is widely successful in applications but relies on many poorly understood techniques and heuristics.
This thesis aims to uncover the hidden inductive biases of machine learning algorithms.
In the first part of the thesis, we uncover the implicit inductive bias of NetGAN, a complex graph generative model with seemingly no prior preferences.
We find that the root of its generalization properties does not lie in the GAN architecture but in an inconspicuous low-rank approximation.
We then use this insight to strip NetGAN of all unnecessary parts, including the GAN, and obtain a highly simplified reformulation.
Next, we present a generic algorithm that reverse-engineers hidden inductive bias in approximate Bayesian inference.
While the inductive bias is completely described by the prior distribution in full Bayesian inference, real-world applications often resort to approximate techniques that can make uncontrollable errors.
By reframing the problem in terms of incompatible conditional distributions, we arrive at a generic algorithm based on pseudo-Gibbs sampling that attributes the change in inductive bias to a change in the prior distribution.
The last part of the thesis concerns a common inductive bias in causal learning, the assumption of independent causal mechanisms. Under this assumption, we consider estimators for confounding strength, which governs the generalization ability from observational distribution to the underlying causal model. We show that an existing estimator is generally inconsistent and propose a consistent estimator based on tools from random matrix theory
Generalization on the Unseen, Logic Reasoning and Degree Curriculum
This paper considers the learning of logical (Boolean) functions with focus
on the generalization on the unseen (GOTU) setting, a strong case of
out-of-distribution generalization. This is motivated by the fact that the rich
combinatorial nature of data in certain reasoning tasks (e.g.,
arithmetic/logic) makes representative data sampling challenging, and learning
successfully under GOTU gives a first vignette of an 'extrapolating' or
'reasoning' learner. We then study how different network architectures trained
by (S)GD perform under GOTU and provide both theoretical and experimental
evidence that for a class of network models including instances of
Transformers, random features models, and diagonal linear networks, a
min-degree-interpolator is learned on the unseen. We also provide evidence that
other instances with larger learning rates or mean-field networks reach leaky
min-degree solutions. These findings lead to two implications: (1) we provide
an explanation to the length generalization problem (e.g., Anil et al. 2022);
(2) we introduce a curriculum learning algorithm called Degree-Curriculum that
learns monomials more efficiently by incrementing supports.Comment: To appear in ICML 202
- âŠ