17 research outputs found

    Managing AI Risks in an Era of Rapid Progress

    Full text link
    In this short consensus paper, we outline risks from upcoming, advanced AI systems. We examine large-scale social harms and malicious uses, as well as an irreversible loss of human control over autonomous AI systems. In light of rapid and continuing AI progress, we propose urgent priorities for AI R&D and governance

    Generalizing in the Real World with Representation Learning

    No full text
    RÉSUMÉ: L'apprentissage automatique formalise le problème de faire en sorte que les ordinateurs peuvent apprendre d'expériences comme optimiser la performance mesurée avec une ou des métriques sur une tache définie pour un ensemble de données. Cela contraste avec l'exigence d'un comportement déterminé en avance (c.-à-d. par règles). La formalisation de ce problème a permis de grands progrès dans de nombreuses applications ayant un impact important dans le monde réel, notamment la traduction, la reconnaissance vocale, les voitures autonomes et la découverte de médicaments. Cependant, les instanciations pratiques de ce formalisme font de nombreuses hypothèses non-realiste pour les données réels - par exemple, que les données sont indépendantes et identiquement distribuées (i.i.d.) - dont la solidité est rarement étudiée. En réalisant de grands progrès en si peu de temps, le domaine a développé de nombreuses normes et standards ad hoc, axés sur une gamme de taches relativement restreinte. Alors que les applications d'apprentissage automatique, en particulier dans les systèmes d'intelligence artificielle, deviennent de plus en plus répandues dans le monde réel, nous devons examiner de manière critique ces normes et hypothèses. Il y a beaucoup de choses que nous ne comprenons toujours pas sur comment et pourquoi les réseaux profonds entraînés avec la descente de gradient sont capables de généraliser aussi bien qu'ils le font, pourquoi ils échouent quand ils le font et comment ils fonctionnent sur des données hors distribution. Dans cette thèse, je couvre certains de mes travaux visant à mieux comprendre la généralisation de réseaux profonds, j'identifie plusieurs façons dont les hypothèses et les problèmes rencontrés ne parviennent pas à se généraliser au monde réel, et je propose des moyens de remédier à ces échecs dans la pratique. ABSTRACT: Machine learning (ML) formalizes the problem of getting computers to learn from experience as optimization of performance according to some metric(s) on a set of data examples. This is in contrast to requiring behaviour specified in advance (e.g. by hard-coded rules). Formalization of this problem has enabled great progress in many applications with large real-world impact, including translation, speech recognition, self-driving cars, and drug discovery. But practical instantiations of this formalism make many assumptions - for example, that data are i.i.d.: independent and identically distributed - whose soundness is seldom investigated. And in making great progress in such a short time, the field has developed many norms and ad-hoc standards, focused on a relatively small range of problem settings. As applications of ML, particularly in artificial intelligence (AI) systems, become more pervasive in the real world, we need to critically examine these assumptions, norms, and problem settings, as well as the methods that have become de-facto standards. There is much we still do not understand about how and why deep networks trained with stochastic gradient descent are able to generalize as well as they do, why they fail when they do, and how they will perform on out-of-distribution data. In this thesis I cover some of my work towards better understanding deep net generalization, identify several ways assumptions and problem settings fail to generalize to the real world, and propose ways to address those failures in practice

    Extreme weather : a large-scale climate dataset for semi-supervised detection, localization and understanding of extreme weather events

    No full text
    Then detection and identification of extreme weather events in large-scale climate simulations is an important problem for risk management, informing governmental policy decisions and advancing our basic understanding of the climate system. Recent work has shown that fully supervised convolutional neural networks (CNNs) can yield acceptable accuracy for classifying well-known types of extreme weather events when large amounts of labeled data are available. However, many different types of spatially localized climate patterns are of interest including hurricanes, extra-tropical cyclones, weather fronts, and blocking events among others. Existing labeled data for these patterns can be incomplete in various ways, such as covering only certain years or geographic areas and having false negatives. This type of climate data therefore poses a number of interesting machine learning challenges. We present a multichannel spatiotemporal CNN architecture for semi-supervised bounding box prediction and exploratory data analysis. We demonstrate that our approach is able to leverage temporal information and unlabeled data to improve the localization of extreme weather events. Further, we explore the representations learned by our model in order to better understand this important data. We present a dataset, ExtremeWeather, to encourage machine learning research in this area and to help facilitate further work in understanding and mitigating the effects of climate change. The dataset is available at extremeweatherdataset.github.io and the code is available at https://github.com/eracah/hur-detect

    A closer look at memorization in deep networks

    No full text
    We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.Comment: Appears in Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Devansh Arpit, Stanis{\l}aw Jastrz\k{e}bski, Nicolas Ballas, and David Krueger contributed equally to this wor
    corecore