273 research outputs found

    Breaking the Curse of Dimensionality in Deep Neural Networks by Learning Invariant Representations

    Full text link
    Artificial intelligence, particularly the subfield of machine learning, has seen a paradigm shift towards data-driven models that learn from and adapt to data. This has resulted in unprecedented advancements in various domains such as natural language processing and computer vision, largely attributed to deep learning, a special class of machine learning models. Deep learning arguably surpasses traditional approaches by learning the relevant features from raw data through a series of computational layers. This thesis explores the theoretical foundations of deep learning by studying the relationship between the architecture of these models and the inherent structures found within the data they process. In particular, we ask What drives the efficacy of deep learning algorithms and allows them to beat the so-called curse of dimensionality-i.e. the difficulty of generally learning functions in high dimensions due to the exponentially increasing need for data points with increased dimensionality? Is it their ability to learn relevant representations of the data by exploiting their structure? How do different architectures exploit different data structures? In order to address these questions, we push forward the idea that the structure of the data can be effectively characterized by its invariances-i.e. aspects that are irrelevant for the task at hand. Our methodology takes an empirical approach to deep learning, combining experimental studies with physics-inspired toy models. These simplified models allow us to investigate and interpret the complex behaviors we observe in deep learning systems, offering insights into their inner workings, with the far-reaching goal of bridging the gap between theory and practice.Comment: PhD Thesis @ EPF

    The End of Bordenkircher: Extending the Logic of Apprendi to Plea Bargaining

    Get PDF
    Although the Supreme Court approved of the use of charging threats nearly thirty years ago in Bordenkircher v. Hayes, a more recent line of cases has subtly undermined key premises of that landmark decision. In order to induce guilty pleas, prosecutors might use any of a number of different tactics. A prosecutor might, for instance, charge aggressively in the first instance and then promise to drop the most serious charges in return for a guilty plea to a lesser offense. Bordenkircher v. Hayes addressed the mirror-image of this tactic: the prosecutor filed relatively minor charges at first, but then threatened to pursue more serious charges if the defendant did not plead guilty. The Supreme Court approved of such charging threats based on two considerations: the efficiency benefits of resolving cases by plea instead of jury trial, and the possibility that prosecutors would evade a ban on threats by charging more aggressively in the first instance. The Court’s reasoning, however, is inconsistent with Apprendi v. New Jersey and its progeny. Apprendi rejected the use of both efficiency considerations and evasion concerns as grounds for impairing access to juries. Apprendi instead emphasized a need for robust checks and balances within the criminal justice system. Because the Apprendi line of cases addressed sentencing procedures, not plea bargaining, their relevance to Bordenkircher v. Hayes has thus far escaped notice. The Article argues, however, that the Court should now overturn Bordenkircher v. Hayes in light of the values it embraced in Apprendi. The Article also proposes a new test for evaluating the constitutionality of charging threats

    APPLICATIONS OF MACHINE LEARNING IN MICROBIAL FORENSICS

    Get PDF
    Microbial ecosystems are complex, with hundreds of members interacting with each other and the environment. The intricate and hidden behaviors underlying these interactions make research questions challenging – but can be better understood through machine learning. However, most machine learning that is used in microbiome work is a black box form of investigation, where accurate predictions can be made, but the inner logic behind what is driving prediction is hidden behind nontransparent layers of complexity. Accordingly, the goal of this dissertation is to provide an interpretable and in-depth machine learning approach to investigate microbial biogeography and to use micro-organisms as novel tools to detect geospatial location and object provenance (previous known origin). These contributions follow with a framework that allows extraction of interpretable metrics and actionable insights from microbiome-based machine learning models. The first part of this work provides an overview of machine learning in the context of microbial ecology, human microbiome studies and environmental monitoring – outlining common practice and shortcomings. The second part of this work demonstrates a field study to demonstrate how machine learning can be used to characterize patterns in microbial biogeography globally – using microbes from ports located around the world. The third part of this work studies the persistence and stability of natural microbial communities from the environment that have colonized objects (vessels) and stay attached as they travel through the water. Finally, the last part of this dissertation provides a robust framework for investigating the microbiome. This framework provides a reasonable understanding of the data being used in microbiome-based machine learning and allows researchers to better apprehend and interpret results. Together, these extensive experiments assist an understanding of how to carry an in-silico design that characterizes candidate microbial biomarkers from real world settings to a rapid, field deployable diagnostic assay. The work presented here provides evidence for the use of microbial forensics as a toolkit to expand our basic understanding of microbial biogeography, microbial community stability and persistence in complex systems, and the ability of machine learning to be applied to downstream molecular detection platforms for rapid and accurate detection

    A Bayesian framework for concept learning

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 1999.Includes bibliographical references (p. 297-314).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Human concept learning presents a version of the classic problem of induction, which is made particularly difficult by the combination of two requirements: the need to learn from a rich (i.e. nested and overlapping) vocabulary of possible concepts and the need to be able to generalize concepts reasonably from only a few positive examples. I begin this thesis by considering a simple number concept game as a concrete illustration of this ability. On this task, human learners can with reasonable confidence lock in on one out of a billion billion billion logically possible concepts, after seeing only four positive examples of the concept, and can generalize informatively after seeing just a single example. Neither of the two classic approaches to inductive inference hypothesis testing in a constrained space of possible rules and computing similarity to the observed examples can provide a complete picture of how people generalize concepts in even this simple setting. This thesis proposes a new computational framework for understanding how people learn concepts from examples, based on the principles of Bayesian inference. By imposing the constraints of a probabilistic model of the learning situation, the Bayesian learner can draw out much more information about a concept's extension from a given set of observed examples than either rule-based or similarity-based approaches do, and can use this information in a rational way to infer the probability that any new object is also an instance of the concept. There are three components of the Bayesian framework: a prior probability distribution over a hypothesis space of possible concepts; a likelihood function, which scores each hypothesis according to its probability of generating the observed examples; and the principle of hypothesis averaging, under which the learner computes the probability of generalizing a concept to new objects by averaging the predictions of all hypotheses weighted by their posterior probability (proportional to the product of their priors and likelihoods). The likelihood, under the assumption of randomly sampled positive examples, embodies the size principle for scoring hypotheses: smaller consistent hypotheses are more likely than larger hypotheses, and they become exponentially more likely as the number of observed examples increases. The principle of hypothesis averaging allows the Bayesian framework to accommodate both rule-like and similarity-like generalization behavior, depending on how peaked the posterior probability is. Together, the size principle plus hypothesis averaging predict a convergence from similarity-like generalization (due to a broad posterior distribution) after very few examples are observed to rule-like generalization (due to a sharply peaked posterior distribution) after sufficiently many examples have been observed. The main contributions of this thesis are as follows. First and foremost, I show how it is possible for people to learn and generalize concepts from just one or a few positive examples (Chapter 2). Building on that understanding, I then present a series of case studies of simple concept learning situations where the Bayesian framework yields both qualitative and quantitative insights into the real behavior of human learners (Chapters 3-5). These cases each focus on a different learning domain. Chapter 3 looks at generalization in continuous feature spaces, a typical representation of objects in psychology and machine learning with the virtues of being analytically tractable and empirically accessible, but the downside of being highly abstract and artificial. Chapter 4 moves to the more natural domain of learning words for categories of objects and shows the relevance of the same phenomena and explanatory principles introduced in the more abstract setting of Chapters 1-3 for real-world learning tasks like this one. In each of these domains, both similarity-like and rule-like generalization emerge as special cases of the Bayesian framework in the limits of very few or very many examples, respectively. However, the transition from similarity to rules occurs much faster in the word learning domain than in the continuous feature space domain. I propose a Bayesian explanation of this difference in learning curves that places crucial importance on the density or sparsity of overlapping hypotheses in the learner's hypothesis space. To test this proposal, a third case study (Chapter 5) returns to the domain of number concepts, in which human learners possess a more complex body of prior knowledge that leads to a hypothesis space with both sparse and densely overlapping components. Here, the Bayesian theory predicts and human learners produce either rule-based or similarity-based generalization from a few examples, depending on the precise examples observed. I also discusses how several classic reasoning heuristics may be used to approximate the much more elaborate computations of Bayesian inference that this domain requires. In each of these case studies, I confront some of the classic questions of concept learning and induction: Is the acquisition of concepts driven mainly by pre-existing knowledge or the statistical force of our observations? Is generalization based primarily on abstract rules or similarity to exemplars? I argue that in almost all instances, the only reasonable answer to such questions is, Both. More importantly, I show how the Bayesian framework allows us to answer much more penetrating versions of these questions: How does prior knowledge interact with the observed examples to guide generalization? Why does generalization appear rule-based in some cases and similarity-based in others? Finally, Chapter 6 summarizes the major contributions in more detailed form and discusses how this work ts into the larger picture of contemporary research on human learning, thinking, and reasoning.by Joshua B. Tenenbaum.Ph.D
    corecore