498 research outputs found

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss

    Full text link
    We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance. While recent works have made important advances in the study of the optimization problem for overparametrized low-rank matrix approximation, much emphasis has been placed on discriminative settings and the square loss. In contrast, our model considers another interesting type of loss and connects with the generative setting. We characterize the critical points and minimizers of the Bures-Wasserstein distance over the space of rank-bounded matrices. For low-rank matrices the Hessian of this loss can theoretically blow up, which creates challenges to analyze convergence of optimizaton methods. We establish convergence results for gradient flow using a smooth perturbative version of the loss and convergence results for finite step size gradient descent under certain assumptions on the initial weights.Comment: 35 pages, 1 figure, accepted at ICML 202

    Translating Islamic Law: the postcolonial quest for minority representation

    Get PDF
    This research sets out to investigate how culture-specific or signature concepts are rendered in English-language discourse on Islamic, or ‘shariÊża’ law, which has Arabic roots. A large body of literature has investigated Islamic law from a technical perspective. However, from the perspective of linguistics and translation studies, little attention has been paid to the lexicon that makes up this specialised discourse. Much of the commentary has so far been prescriptive, with limited empirical evidence. This thesis aims to bridge this gap by exploring how ‘culturalese’ (i.e., ostensive cultural discourse) travels through language, as evidenced in the self-built Islamic Law Corpus (ILC), a 9-million-word monolingual English corpus, covering diverse genres on Islamic finance and family law. Using a mixed methods design, the study first quantifies the different linguistic strategies used to render shariÊża-based concepts in English, in order to explore ‘translation’ norms based on linguistic frequency in the corpus. This quantitative analysis employs two models: profile-based correspondence analysis, which considers the probability of lexical variation in expressing a conceptual category, and logistic regression (using MATLAB programming software), which measures the influence of the explanatory variables ‘genre’, ‘legal function’ and ‘subject field’ on the choice between an Arabic loanword and an endogenous English lexeme, i.e., a close English equivalent. The findings are then interpreted qualitatively in the light of postcolonial translation agendas, which aim to preserve intangible cultural heritage and promote the representation of minoritised groups. The research finds that the English-language discourse on Islamic law is characterised by linguistic borrowing and glossing, implying an ideologically driven variety of English that can be usefully labelled as a kind of ‘Islamgish’ (blending ‘Islamic’ and ‘English’) aimed at retaining symbols of linguistic hybridity. The regression analysis confirms the influence of the above-mentioned contextual factors on the use of an Arabic loanword versus English alternatives

    Linear CNNs Discover the Statistical Structure of the Dataset Using Only the Most Dominant Frequencies

    Full text link
    We here present a stepping stone towards a deeper understanding of convolutional neural networks (CNNs) in the form of a theory of learning in linear CNNs. Through analyzing the gradient descent equations, we discover that the evolution of the network during training is determined by the interplay between the dataset structure and the convolutional network structure. We show that linear CNNs discover the statistical structure of the dataset with non-linear, ordered, stage-like transitions, and that the speed of discovery changes depending on the relationship between the dataset and the convolutional network structure. Moreover, we find that this interplay lies at the heart of what we call the ``dominant frequency bias'', where linear CNNs arrive at these discoveries using only the dominant frequencies of the different structural parts present in the dataset. We furthermore provide experiments that show how our theory relates to deep, non-linear CNNs used in practice. Our findings shed new light on the inner working of CNNs, and can help explain their shortcut learning and their tendency to rely on texture instead of shape

    Ample groupoid homology and \'etale correspondences

    Full text link
    We show that \'etale correspondences between ample groupoids induce homomorphisms of homology groups. To complement this we explore the module categories of ample groupoids. We construct an induction-restriction adjunction for subgroupoids, which generates a procedure for building resolutions of arbitrary groupoid modules. These resolutions can be used to work with the Tor picture of groupoid homology, enabling explicit descriptions of the maps in homology induced by \'etale correspondences.Comment: 16 page

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Nash's bargaining problem and the scale-invariant Hirsch citation index

    Full text link
    A number of citation indices have been proposed for measuring and ranking the research publication records of scholars. Some of the best known indices, such as those proposed by Hirsch and Woeginger, are designed to reward most highly those records that strike some balance between productivity (number of papers published), and impact (frequency with which those papers are cited). A large number of rarely cited publications will not score well, nor will a very small number of heavily cited papers. We discuss three new citation indices, one of which was independently proposed in \cite{FHLB}. Each rests on the notion of scale invariance, fundamental to John Nash's solution of the two-person bargaining problem. Our main focus is on one of these -- a scale invariant version of the Hirsch index. We argue that it has advantages over the original; it produces fairer rankings within subdisciplines, is more decisive (discriminates more finely, yielding fewer ties) and more dynamic (growing over time via more frequent, smaller increments), and exhibits enhanced centrality and tail balancedness. Simulations suggest that scale invariance improves robustness under Poisson noise, with increased decisiveness having no cost in terms of the number of ``accidental" reversals, wherein random irregularities cause researcher A to receive a lower index value than B, although A's productivity and impact are both slightly higher than B's. Moreover, we provide an axiomatic characterization of the scale invariant Hirsch index, via axioms that bear a close relationship, in discrete analogue, to those used by Nash in \cite{Nas50}. This argues for the mathematical naturality of the new index. An earlier version was presented at the 5th World Congress of the Game Theory Society, Maastricht, Netherlands in 2016.Comment: 44 pages, 8 figure

    Inductive Bias in Machine Learning

    Get PDF
    Induktive Verzerrung beschreibt die PrĂ€ferenz fĂŒr Lösungen, welche ein Algorithmus fĂŒr maschinelles Lernen hat, bevor er Daten sieht. Sie ist notwendiger Bestandteil fĂŒr das Ziel des maschinellen Lernens, nĂ€mlich von einer Menge an Beispielen auf ungesehene Datenpunkte zu verallgemeinern. In der Praxis wird die induktive Verzerrung jedoch oft nicht explizit spezifiziert, was theoretisches VerstĂ€ndnis verhindert und das Vertrauen in maschinelles Lernen untergrĂ€bt. Am deutlichsten wird dieses Problem am zeitgenössischen Beispiel von deep learning, das zwar in vielen Anwendungen erfolgreich ist, aber auf einer Vielzahl schlecht verstandener Techniken und Heuristiken beruht. Ziel dieser Dissertation ist es, die versteckten induktiven Verzerrungen von Algorithmen des maschinellen Lernens aufzudecken. Im ersten Teil der Dissertation decken wir die induktive Verzerrung von NetGAN auf, einem komplexen generativen Graphenmodell, das scheinbar keine PrĂ€ferenzen hat. Wir stellen fest, dass die Ursache der Generalisierung nicht in der GAN-Architektur liegt, sondern in einer unscheinbaren Approximation mit niedrigem Rang. Wir nutzen diese Erkenntnis, um NetGAN von allen unnötigen Teilen, einschließlich des GAN, zu befreien und eine stark vereinfachte Reformulierung zu erhalten. Als NĂ€chstes prĂ€sentieren wir einen generischen Algorithmus, der die versteckte induktive Verzerrung in der approximativen Bayesschen Inferenz enthĂŒllt. WĂ€hrend die induktive Verzerrung bei der Bayesschen Inferenz vollstĂ€ndig durch den Prior beschrieben wird, greifen reale Anwendungen oft auf approximative Techniken zurĂŒck, die unkontrollierbare Fehler machen können. Indem wir das Problem in Form von inkompatiblen bedingten Verteilungen reformulieren, kommen wir zu einem generischen Algorithmus, der auf Pseudo-Gibbs-Sampling basiert und die Änderung der induktiven Verzerrung auf eine Änderung des Priors zurĂŒckfĂŒhrt. Der letzte Teil der Dissertation betrifft eine hĂ€ufige induktive Verzerrung beim kausalen Lernen, die Annahme unabhĂ€ngiger kausaler Mechanismen. Unter dieser Annahme betrachten wir SchĂ€tzer fĂŒr die StĂ€rke von Störfaktoren, die die Generalisierung von der Beobachtungsverteilung auf das zugrunde liegende kausale Modell bestimmt. Wir zeigen, dass ein bestehender SchĂ€tzer im Allgemeinen inkonsistent ist und prĂ€sentieren einen konsistenten SchĂ€tzer mit Werkzeugen aus der Theorie von Zufallsmatrizen.Inductive bias describes the preference for solutions that a machine learning algorithm holds before seeing any data. It is a necessary ingredient for the goal of machine learning, which is to generalize from a set of examples to unseen data points. Yet, the inductive bias of learning algorithms is often not specified explicitly in practice, which prevents a theoretical understanding and undermines trust in machine learning. This issue is most prominently visible in the contemporary case of deep learning, which is widely successful in applications but relies on many poorly understood techniques and heuristics. This thesis aims to uncover the hidden inductive biases of machine learning algorithms. In the first part of the thesis, we uncover the implicit inductive bias of NetGAN, a complex graph generative model with seemingly no prior preferences. We find that the root of its generalization properties does not lie in the GAN architecture but in an inconspicuous low-rank approximation. We then use this insight to strip NetGAN of all unnecessary parts, including the GAN, and obtain a highly simplified reformulation. Next, we present a generic algorithm that reverse-engineers hidden inductive bias in approximate Bayesian inference. While the inductive bias is completely described by the prior distribution in full Bayesian inference, real-world applications often resort to approximate techniques that can make uncontrollable errors. By reframing the problem in terms of incompatible conditional distributions, we arrive at a generic algorithm based on pseudo-Gibbs sampling that attributes the change in inductive bias to a change in the prior distribution. The last part of the thesis concerns a common inductive bias in causal learning, the assumption of independent causal mechanisms. Under this assumption, we consider estimators for confounding strength, which governs the generalization ability from observational distribution to the underlying causal model. We show that an existing estimator is generally inconsistent and propose a consistent estimator based on tools from random matrix theory

    Generalization on the Unseen, Logic Reasoning and Degree Curriculum

    Full text link
    This paper considers the learning of logical (Boolean) functions with focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization. This is motivated by the fact that the rich combinatorial nature of data in certain reasoning tasks (e.g., arithmetic/logic) makes representative data sampling challenging, and learning successfully under GOTU gives a first vignette of an 'extrapolating' or 'reasoning' learner. We then study how different network architectures trained by (S)GD perform under GOTU and provide both theoretical and experimental evidence that for a class of network models including instances of Transformers, random features models, and diagonal linear networks, a min-degree-interpolator is learned on the unseen. We also provide evidence that other instances with larger learning rates or mean-field networks reach leaky min-degree solutions. These findings lead to two implications: (1) we provide an explanation to the length generalization problem (e.g., Anil et al. 2022); (2) we introduce a curriculum learning algorithm called Degree-Curriculum that learns monomials more efficiently by incrementing supports.Comment: To appear in ICML 202
    • 

    corecore