144 research outputs found

    Image zooming based on sampling theorems

    Get PDF
    In this paper we introduce two digital zoom methods based on sampling theory and we study their mathematical foundation. The first one (usually known by the names of "sinc interpolation", "zero-padding" and "Fourier zoom") is commonly used by the image processing community

    SDSS-IV MaNGA IFS Galaxy Survey—Survey Design, Execution, and Initial Data Quality

    Get PDF
    The MaNGA Survey (Mapping Nearby Galaxies at Apache Point Observatory) is one of three core programs in the Sloan Digital Sky Survey IV. It is obtaining integral field spectroscopy for 10,000 nearby galaxies at a spectral resolution of R ~ 2000 from 3622 to 10354 Å. The design of the survey is driven by a set of science requirements on the precision of estimates of the following properties: star formation rate surface density, gas metallicity, stellar population age, metallicity, and abundance ratio, and their gradients; stellar and gas kinematics; and enclosed gravitational mass as a function of radius. We describe how these science requirements set the depth of the observations and dictate sample selection. The majority of targeted galaxies are selected to ensure uniform spatial coverage in units of effective radius (Re) while maximizing spatial resolution. About two-thirds of the sample is covered out to 1.5Re (Primary sample), and one-third of the sample is covered to 2.5Re (Secondary sample). We describe the survey execution with details that would be useful in the design of similar future surveys. We also present statistics on the achieved data quality, specifically the point-spread function, sampling uniformity, spectral resolution, sky subtraction, and flux calibration. For our Primary sample, the median r-band signal-to-noise ratio is ~70 per 1.4 Å pixel for spectra stacked between 1R e and 1.5Re. Measurements of various galaxy properties from the first-year data show that we are meeting or exceeding the defined requirements for the majority of our science goals

    Auto-Encoders, Distributed Training and Information Representation in Deep Neural Networks

    Get PDF
    L'objectif de cette thèse est de présenter ma modeste contribution à l'effort collectif de l'humanité pour comprendre l'intelligence et construire des machines intelligentes. Ceci est une thèse par articles (cinq au total), tous représentant une entreprise personnelle dans laquelle j'ai consacré beaucoup d'énergie. Les articles sont présentés en ordre chronologique, et ils touchent principalement à deux sujets : l'apprentissage de représentations et l'optimisation. Les articles des chapitres 3, 5 et 9 sont dans la première catégorie, et ceux des chapitres 7 et 11 sont dans la seconde catégorie. Dans le premier article, nous partons de l'idée de modéliser la géométrie des données en entraînant un auto-encodeur débruitant qui reconstruit les données après qu'on les ait perturbées. Nous établissons un lien entre les auto-encodeurs contractifs et les auto-encodeurs débruitants. Notre contribution majeure consiste à démontrer mathématiquement une propriété intéressante qu'ont les solutions optimales aux auto-encodeurs débruitants lorsqu'ils sont définis à partir de bruit additif gaussien. Plus spécifiquement, nous démontrons qu'ils apprennent le score de la densité de probabilité. Nous présentons un ensemble de méthodes pratiques par lesquelles ce résultat nous permet de transformer un auto-encodeur en modèle génératif. Nous menons certaines expériences dans le but d'apprendre la géométrie locale des distributions de données. Dans le second article, nous continuons dans la même ligne d'idées en construisant un modèle génératif basé sur l'apprentissage de distributions conditionnelles. Cet exercice se fait dans un cadre plus général et nous nous concentrons sur les propriétés de la chaine de Markov obtenu par échantillonnage de Gibbs. à l'aide d'une petite modification lors de la construction de la chaine de Markov, nous obtenons un modèle que l'on nomme "Generative Stochastic Networks". Plusieurs copies de ce modèle peuvent se combiner pour créer une hiérarchie de représentations abstraites servant à mieux représenter la nature des données. Nous présentons des expériences sur l'ensemble de données MNIST et sur le remplissage d'images trouées. Dans notre troisième article, nous présentons un nouveau paradigme pour l'optimisation parallèle. Nous proposons d'utiliser un ensemble de noeuds de calcul pour évaluer les coefficients nécessaires à faire de l'échantillonnage préférentiel sur les données d'entraînement. Cette idée ressemble beaucoup à l'apprentissage avec curriculum qui est une méthode dans laquelle l'ordre des données fournies au modèle est choisi avec beaucoup de soin dans le but de faciliter l'apprentissage. Nous comparons les résultats expérimentaux observés à ceux anticipés en terme de réduction de variance sur les gradients. Dans notre quatrième article, nous revenons au concept d'apprentissage de représentations et nous cherchons à savoir s'il serait possible de définir une notion utile de "contenu en information" dans le contexte de couches de réseaux neuronaux. Ceci nous intéresse en particulier parce qu'il y a une sorte de paradoxe avec les réseaux profonds qui sont déterministes. Les couches les plus profondes ont des meilleures représentations que les premières couches, mais si l'on regarde strictement avec le point de vue de l'entropie (venant de la théorie de l'information) il est impossible qu'une couche plus profonde contienne plus d'information qu'une couche à l'entrée. Nous développons une méthode d'entraînement de classifieur linéaire sur chaque couche du modèle étudié (dont les paramètres sont maintenant figés pendant l'étude). Nous appelons ces classifeurs des "sondes linéaires de classification", et nous nous en servons pour mieux comprendre la dynamique particulière de l'entraînement d'un réseau profond. Nous présentons des expériences menées sur des gros modèles (Inception v3 et ResNet-50), et nous découvrons une propriété étonnante : la performance de ces sondes augmente de manière monotone lorsque l'on descend dans les couches plus profondes. Dans le cinquième article, nous retournons à l'optimisation, et nous étudions la courbure de l'espace de la fonction de perte. Nous regardons les vecteurs propres dominants de la matrice hessienne, et nous explorons les gains potentiels dans ces directions s'il était possible de faire un pas d'une longueur optimale. Nous sommes principalement intéressés par les gains dans les directions associées aux valeurs propres négatives car celles-ci sont généralement ignorées par les méthodes populaire d'optimisation convexes. L'étude de la matrice hessienne demande des coûts énormes en calcul, et nous devons nous limiter à des expérience sur les données MNIST. Nous découvrons que des gains très importants peuvent être réalisés dans les directions de courbure négative, et que les longueurs de pas optimales sont beaucoup plus grandes que celles suggérées par la littérature existante.The goal of this thesis is to present a body of work that serves as my modest contribution to humanity's quest to understand intelligence and to implement intelligent systems. This is a thesis by articles, containing five articles, not all of equal impact, but all representing a very meaningful personal endeavor. The articles are presented in chronological order, and they cluster around two general topics : representation learning and optimization. Articles from chapters 3, 5, and 9 are in the former category, whereas articles from chapters 7 and 11 are in the latter. In the first article, we start with the idea of manifold learning through training a denoising auto-encoder to locally reconstruct data after perturbations. We establish a connection between contractive auto-encoders and denoising auto-encoders. More importantly, we prove mathematically a very interesting property from the optimal solution to denoising auto-encoders with additive gaussian noise. Namely, the fact that they learn exactly the score of the probability density function of the training distribution. We present a collection of ways in which this allows us to turn an auto-encoder into a generative model. We provide experiments all related to the goal of local manifold learning. In the second article, we continue with that idea of building a generative model by learning conditional distributions. We do that in a more general setting and we focus more on the properties of the Markov chain obtained by Gibbs sampling. With a small modification in the construction of the Markov chain, we obtain the more general "Generative Stochastic Networks", which we can then stack together into a structure that can represent more accurately the different levels of abstraction of the data modeled. We present experiments involving the generation of MNIST digits and image inpainting. In the third article, we present a novel idea for distributed optimization. Our proposal uses a collection of worker nodes to compute the importance weights to be used by one master node to perform Importance Sampling. This paradigm has a lot in common with the idea of curriculum learning, whereby the order of training examples is taken to have a significant impact on the training performance. We present results to compare the potential reduction in variance for gradient estimates with the practical reduction in variance observed. In the fourth article, we go back to the concept of representation learning by asking whether there would be any measurable quantity in a neural network layer that would correspond intuitively to its "information contents". This is particularly interesting because there is a kind of paradox in deterministic neural networks : deeper layers encode better representations of the input signal, but they carry less (or equal) information than the raw inputs (in terms of entropy). By training a linear classifier on every layer in a neural network (with frozen parameters), we are able to measure linearly separability of the representations at every layer. We call these "linear classifier probes", and we show how they can be used to better understand the dynamics of training a neural network. We present experiments with large models (Inception v3 and ResNet-50) and uncover a surprizing property : linear separability increases in a strictly monotonic relationship with the layer depth. In the fifth article, we revisit optimization again, but now we study the negative curvature of the loss function. We look at the most dominant eigenvalues and eigenvectors of the Hessian matrix, and we explore the gains to be made by modifying the model parameters along that direction with an optimal step size. We are mainly interested in the potential gains for directions of negative curvature, because those are ignored by the very popular convex optimization methods used by the deep learning community. Due to the large computational costs of anything dealing with the Hessian matrix, we run a small model on MNIST. We find that large gains can be made in directions of negative curvature, and that the optimal step sizes involved are larger than the current literature would recommend

    Random ergodic theorems with universally representative sequences

    Get PDF
    When elements of a measure-preserving action of Rd or Zd are selected in a random way, according to a stationary stochastic process, a.e. convergence of the averages of an LP function along the resulting orbits may almost surely hold, in every system; in such a case we call the sampling scheme universally representative. We show that i.i.d. integervalued sampling schemes are universally representative (with p > 1) if and only if they have nonzero mean, and we discuss a variety of other sampling schemes which have or lack this property

    Lifting Weak Supervision To Structured Prediction

    Full text link
    Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources. WS is theoretically well understood for binary classification, where simple approaches enable consistent estimation of pseudolabel noise rates. Using this result, it has been shown that downstream models trained on the pseudolabels have generalization guarantees nearly identical to those trained on clean labels. While this is exciting, users often wish to use WS for structured prediction, where the output space consists of more than a binary or multi-class label set: e.g. rankings, graphs, manifolds, and more. Do the favorable theoretical properties of WS for binary classification lift to this setting? We answer this question in the affirmative for a wide range of scenarios. For labels taking values in a finite metric space, we introduce techniques new to weak supervision based on pseudo-Euclidean embeddings and tensor decompositions, providing a nearly-consistent noise rate estimator. For labels in constant-curvature Riemannian manifolds, we introduce new invariants that also yield consistent noise rate estimation. In both cases, when using the resulting pseudolabels in concert with a flexible downstream model, we obtain generalization guarantees nearly identical to those for models trained on clean data. Several of our results, which can be viewed as robustness guarantees in structured prediction with noisy labels, may be of independent interest. Empirical evaluation validates our claims and shows the merits of the proposed method

    Toeplitz Low-Rank Approximation with Sublinear Query Complexity

    Full text link
    We present a sublinear query algorithm for outputting a near-optimal low-rank approximation to any positive semidefinite Toeplitz matrix TRd×dT \in \mathbb{R}^{d \times d}. In particular, for any integer rank kdk \leq d and ϵ,δ>0\epsilon,\delta > 0, our algorithm makes O~(k2log(1/δ)poly(1/ϵ))\tilde{O} \left (k^2 \cdot \log(1/\delta) \cdot \text{poly}(1/\epsilon) \right ) queries to the entries of TT and outputs a rank O~(klog(1/δ)/ϵ)\tilde{O} \left (k \cdot \log(1/\delta)/\epsilon\right ) matrix T~Rd×d\tilde{T} \in \mathbb{R}^{d \times d} such that TT~F(1+ϵ)TTkF+δTF\| T - \tilde{T}\|_F \leq (1+\epsilon) \cdot \|T-T_k\|_F + \delta \|T\|_F. Here, F\|\cdot\|_F is the Frobenius norm and TkT_k is the optimal rank-kk approximation to TT, given by projection onto its top kk eigenvectors. O~()\tilde{O}(\cdot) hides polylog(d)\text{polylog}(d) factors. Our algorithm is \emph{structure-preserving}, in that the approximation T~\tilde{T} is also Toeplitz. A key technical contribution is a proof that any positive semidefinite Toeplitz matrix in fact has a near-optimal low-rank approximation which is itself Toeplitz. Surprisingly, this basic existence result was not previously known. Building on this result, along with the well-established off-grid Fourier structure of Toeplitz matrices [Cybenko'82], we show that Toeplitz T~\tilde{T} with near optimal error can be recovered with a small number of random queries via a leverage-score-based off-grid sparse Fourier sampling scheme.Comment: Accepted in SODA 202

    Federated Hypergradient Descent

    Full text link
    In this work, we explore combining automatic hyperparameter tuning and optimization for federated learning (FL) in an online, one-shot procedure. We apply a principled approach on a method for adaptive client learning rate, number of local steps, and batch size. In our federated learning applications, our primary motivations are minimizing communication budget as well as local computational resources in the training pipeline. Conventionally, hyperparameter tuning methods involve at least some degree of trial-and-error, which is known to be sample inefficient. In order to address our motivations, we propose FATHOM (Federated AuTomatic Hyperparameter OptiMization) as a one-shot online procedure. We investigate the challenges and solutions of deriving analytical gradients with respect to the hyperparameters of interest. Our approach is inspired by the fact that, with the exception of local data, we have full knowledge of all components involved in our training process, and this fact can be exploited in our algorithm impactfully. We show that FATHOM is more communication efficient than Federated Averaging (FedAvg) with optimized, static valued hyperparameters, and is also more computationally efficient overall. As a communication efficient, one-shot online procedure, FATHOM solves the bottleneck of costly communication and limited local computation, by eliminating a potentially wasteful tuning process, and by optimizing the hyperparamters adaptively throughout the training procedure without trial-and-error. We show our numerical results through extensive empirical experiments with the Federated EMNIST-62 (FEMNIST) and Federated Stack Overflow (FSO) datasets, using FedJAX as our baseline framework

    Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias

    Full text link
    Neural networks trained with (stochastic) gradient descent have an inductive bias towards learning simpler solutions. This makes them highly prone to learning simple spurious features that are highly correlated with a label instead of the predictive but more complex core features. In this work, we show that, interestingly, the simplicity bias of gradient descent can be leveraged to identify spurious correlations, early in training. First, we prove on a two-layer neural network, that groups of examples with high spurious correlation are separable based on the model's output, in the initial training iterations. We further show that if spurious features have a small enough noise-to-signal ratio, the network's output on the majority of examples in a class will be almost exclusively determined by the spurious features and will be nearly invariant to the core feature. Finally, we propose SPARE, which separates large groups with spurious correlations early in training, and utilizes importance sampling to alleviate the spurious correlation, by balancing the group sizes. We show that SPARE achieves up to 5.6% higher worst-group accuracy than state-of-the-art methods, while being up to 12x faster. We also show the applicability of SPARE to discover and mitigate spurious correlations in Restricted ImageNet

    Checking Trustworthiness of Probabilistic Computations in a Typed Natural Deduction System

    Full text link
    In this paper we present the probabilistic typed natural deduction calculus TPTND, designed to reason about and derive trustworthiness properties of probabilistic computational processes, like those underlying current AI applications. Derivability in TPTND is interpreted as the process of extracting nn samples of possibly complex outputs with a certain frequency from a given categorical distribution. We formalize trust for such outputs as a form of hypothesis testing on the distance between such frequency and the intended probability. The main advantage of the calculus is to render such notion of trustworthiness checkable. We present a computational semantics for the terms over which we reason and then the semantics of TPTND, where logical operators as well as a Trust operator are defined through introduction and elimination rules. We illustrate structural and metatheoretical properties, with particular focus on the ability to establish under which term evolutions and logical rules applications the notion of trustworhtiness can be preserved

    Algorithmic Fairness in Business Analytics: Directions for Research and Practice

    Full text link
    The extensive adoption of business analytics (BA) has brought financial gains and increased efficiencies. However, these advances have simultaneously drawn attention to rising legal and ethical challenges when BA inform decisions with fairness implications. As a response to these concerns, the emerging study of algorithmic fairness deals with algorithmic outputs that may result in disparate outcomes or other forms of injustices for subgroups of the population, especially those who have been historically marginalized. Fairness is relevant on the basis of legal compliance, social responsibility, and utility; if not adequately and systematically addressed, unfair BA systems may lead to societal harms and may also threaten an organization's own survival, its competitiveness, and overall performance. This paper offers a forward-looking, BA-focused review of algorithmic fairness. We first review the state-of-the-art research on sources and measures of bias, as well as bias mitigation algorithms. We then provide a detailed discussion of the utility-fairness relationship, emphasizing that the frequent assumption of a trade-off between these two constructs is often mistaken or short-sighted. Finally, we chart a path forward by identifying opportunities for business scholars to address impactful, open challenges that are key to the effective and responsible deployment of BA
    corecore