24 research outputs found

    PAC-learning geometrical figures

    Get PDF

    The Informational Complexity of Learning from Examples

    Get PDF
    This thesis attempts to quantify the amount of information needed to learn certain tasks. The tasks chosen vary from learning functions in a Sobolev space using radial basis function networks to learning grammars in the principles and parameters framework of modern linguistic theory. These problems are analyzed from the perspective of computational learning theory and certain unifying perspectives emerge

    Neural networks and quantum many-body physics: exploring reciprocal benefits.

    Get PDF
    One of the main reasons why the physics of quantum many-body systems is hard lies in the curse of dimensionality: The number of states of such systems increases exponentially with the number of degrees of freedom involved. As a result, computations for realistic systems become intractable, and even numerical methods are limited to comparably small system sizes. Many efforts in modern physics research are therefore concerned with finding efficient representations of quantum states and clever approximations schemes that would allow them to characterize physical systems of interest. Meanwhile, Deep Learning (DL) has solved many non-scientific problems that have been unaccessible to conventional methods for a similar reason. The concept underlying DL is to extract knowledge from data by identifying patterns and regularities. The remarkable success of DL has excited many physicists about the prospect of leveraging its power to solve intractable problems in physics. At the same time, DL turned out to be an interesting complex many-body problem in itself. In contrast to its widespread empirical applications, the theoretical foundation of DL is strongly underdeveloped. In particular, as long as its decision-making process and result interpretability remain opaque, DL can not claim the status of a scientific tool. In this thesis, I explore the interface between DL and quantum many-body physics, and investigate DL both as a tool and as a subject of study. The first project presented here is a theory-based study of a fundamental open question about the role of width and the number of parameters in deep neural networks. In this work, we consider a DL setup for the image recognition task on standard benchmarking datasets. We combine controlled experiments with a theoretical analysis, including analytical calculations for a toy model. The other three works focus on the application of Restricted Boltzmann Machines as generative models for the task of wavefunction reconstruction from measurement data on a quantum many-body system. First, we implement this approach as a software package, making it available as a tool for experimentalists. Following the idea that physics problems can be used to characterize DL tools, we then use our extensive knowledge of this setup to conduct a systematic study of how the RBM complexity scales with the complexity of the physical system. Finally, in a follow-up study we focus on the effects of parameter pruning techniques on the RBM and its scaling behavior

    A Primer on Bayesian Neural Networks: Review and Debates

    Full text link
    Neural networks have achieved remarkable performance across various problem domains, but their widespread applicability is hindered by inherent limitations such as overconfidence in predictions, lack of interpretability, and vulnerability to adversarial attacks. To address these challenges, Bayesian neural networks (BNNs) have emerged as a compelling extension of conventional neural networks, integrating uncertainty estimation into their predictive capabilities. This comprehensive primer presents a systematic introduction to the fundamental concepts of neural networks and Bayesian inference, elucidating their synergistic integration for the development of BNNs. The target audience comprises statisticians with a potential background in Bayesian methods but lacking deep learning expertise, as well as machine learners proficient in deep neural networks but with limited exposure to Bayesian statistics. We provide an overview of commonly employed priors, examining their impact on model behavior and performance. Additionally, we delve into the practical considerations associated with training and inference in BNNs. Furthermore, we explore advanced topics within the realm of BNN research, acknowledging the existence of ongoing debates and controversies. By offering insights into cutting-edge developments, this primer not only equips researchers and practitioners with a solid foundation in BNNs, but also illuminates the potential applications of this dynamic field. As a valuable resource, it fosters an understanding of BNNs and their promising prospects, facilitating further advancements in the pursuit of knowledge and innovation.Comment: 65 page

    Security Evaluation of Support Vector Machines in Adversarial Environments

    Full text link
    Support Vector Machines (SVMs) are among the most popular classification techniques adopted in security applications like malware detection, intrusion detection, and spam filtering. However, if SVMs are to be incorporated in real-world security systems, they must be able to cope with attack patterns that can either mislead the learning algorithm (poisoning), evade detection (evasion), or gain information about their internal parameters (privacy breaches). The main contributions of this chapter are twofold. First, we introduce a formal general framework for the empirical evaluation of the security of machine-learning systems. Second, according to our framework, we demonstrate the feasibility of evasion, poisoning and privacy attacks against SVMs in real-world security problems. For each attack technique, we evaluate its impact and discuss whether (and how) it can be countered through an adversary-aware design of SVMs. Our experiments are easily reproducible thanks to open-source code that we have made available, together with all the employed datasets, on a public repository.Comment: 47 pages, 9 figures; chapter accepted into book 'Support Vector Machine Applications

    Machine learning on a budget

    Full text link
    Thesis (Ph.D.)--Boston UniversityIn a typical discriminative learning setting, a set of labeled training examples is given, and the goal is to learn a decision rule that accurately classifies (or labels) unseen test examples. Much of machine learning research has focused on improving accuracy, but more recently costs of learning and decision making are becoming more important. Such costs arise both during training and testing. Labeling data for training is often an expensive process. During testing, acquiring or processing measurements for every decision is also costly. This work deals with two problems: how to reduce the amount of labeled data during training, and how to minimize measurements cost in making decisions during testing, while maintaining system accuracy. The first part falls into an area known as active learning. It deals with the problem of selecting a small subset of examples to label, from a pool of unlabeled data, for training a good classifier. This problem is relevant in many applications where a large collection of unlabeled data is readily available but to label an instance requires using an expensive expert (a radiologist annotating a medical image). We study active learning in the boosting framework. We develop a practical algorithm that labels examples to maximally reduce the space of feasible classifiers. We show that, under certain assumptions, our strategy achieves the generalization error performance of a system trained on the entire data set while only selecting logarithmically many samples to label. In the second part, we study sequential classifiers under budget constraints. In many systems, such as medical diagnosis and homeland security, sensors have varying acquisition costs, and these costs account for delay, throughput or monetary value. While some decisions require all measurements, it is often unnecessary to use every modality to classify every example. So the problem is to learn a system that, for every decision, sequentially selects sensors to meet a measurement budget while minimizing classification error. Initially, we study the case where the sensor order in which measurement are acquired is given. For every instance, our system has to decide whether to seek more measurements from the next sensor or to terminate by classifying based on the available information. We use Bayesian analysis of this problem to construct a novel multi-stage empirical risk objective and directly learn sequential decision functions from training data. We provide practical algorithms for binary and multi-class settings and derive generalization error guarantees. We compare our approach to alternative strategies on real world data. In the last section, we explore a decision system when the order of sensors is no longer fixed. We investigate how to combine ideas from reinforcement and imitation learning with empirical risk minimization to learn a dynamic sensor selection policy

    Catching words in a stream of speach:computational simulations of segmenting transcribed child-directed speech

    Get PDF
    De segmentatie van continue spraak in lexicale eenheden is één van de eerste vaardigheden die een kind moet leren gedurende de taalverwerving. Dit proefschrift onderzoekt segmentatie met behulp van computationeel modelleren en computationele simulaties. Segmentatie is moeilijker dan het op het eerste gezicht kan lijken. Kinderen moeten woorden vinden in een continue stroom van spraak, zonder kennis van woorden te hebben. Gelukkig laten experimentele studies zien dat kinderen en volwassen een aantal aanwijzingen uit de invoer gebruiken, alsmede simpele strategieën die gebruik maken van deze aanwijzingen, om spraak te segmenteren. Nog interessanter is dat een aantal van deze aanwijzingen taal-onafhankelijk zijn, waardoor een taalverwerver continue input kan segmenteren voordat het een enkel woord kent. De modellen die in dit proefschrift voorgesteld worden, verschillen op twee belangrijke vlakken van modellen uit de literatuur. Ten eerste gebruiken ze lokale strategieën – in tegenstelling tot globale optimalisatie – die gebruik maken van aanwijzingen waarvan bekend is dat kinderen ze gebruiken, namelijk voorspelbaarheidsstatistieken, fonotactiek en lexicale beklemtoning. Ten tweede worden deze aanwijzingen gecombineerd met behulp van een expliciet aanwijzing-combinatie model, dat eenvoudig uitgebreid kan worden met meer aanwijzingen

    Information Bottleneck

    Get PDF
    The celebrated information bottleneck (IB) principle of Tishby et al. has recently enjoyed renewed attention due to its application in the area of deep learning. This collection investigates the IB principle in this new context. The individual chapters in this collection: • provide novel insights into the functional properties of the IB; • discuss the IB principle (and its derivates) as an objective for training multi-layer machine learning structures such as neural networks and decision trees; and • offer a new perspective on neural network learning via the lens of the IB framework. Our collection thus contributes to a better understanding of the IB principle specifically for deep learning and, more generally, of information–theoretic cost functions in machine learning. This paves the way toward explainable artificial intelligence

    Reconstruction and Parameter Estimation of Dynamical Systems using Neural Networks

    Get PDF
    Dynamical systems can be loosely regarded as systems whose dynamics is entirely determined by en evolution function and an initial condition, being therefore completely deterministic and a priori predictable. Nevertheless, their phenomenology is surprisingly rich, including intriguing phenomena such as chaotic dynamics, fractal dimensions and entropy production. In Climate Science for example, the emergence of chaos forbids us to have meteorological forecasts going beyond fourteen days in the future in the current epoch and therefore building predictive systems that overcome this limitation, at least partially, are of the extreme importance since we live in fast-changing climate world, as proven by the recent not-so-extreme-anymore climate phenomena. At the same time, Machine Learning techniques have been widely applied to practically every field of human knowledge starting from approximately ten years ago, when essentially two factors contributed to the so-called rebirth of Deep Learning: the availability of larger datasets, putting us in the era of Big Data, and the improvement of computational power. However, the possibility to apply Neural Networks to chaotic systems have been widely debated, since these models are very data hungry and rely thus on the availability of large datasets, whereas often Climate data are rare and sparse. Moreover, chaotic dynamics should not rely much on past statistics, which these models are built on. In this thesis, we explore the possibility to study dynamical systems, seen as simple proxies of Climate models, by using Neural Networks, possibly adding prior knowledge on the underlying physical processes in the spirit of Physics Informed Neural Networks, aiming to the reconstruction of the Weather (short term dynamics) and Climate (long term dynamics) of these dynamical systems as well as the estimation of unknown parameters from Data.Dynamical systems can be loosely regarded as systems whose dynamics is entirely determined by en evolution function and an initial condition, being therefore completely deterministic and a priori predictable. Nevertheless, their phenomenology is surprisingly rich, including intriguing phenomena such as chaotic dynamics, fractal dimensions and entropy production. In Climate Science for example, the emergence of chaos forbids us to have meteorological forecasts going beyond fourteen days in the future in the current epoch and therefore building predictive systems that overcome this limitation, at least partially, are of the extreme importance since we live in fast-changing climate world, as proven by the recent not-so-extreme-anymore climate phenomena. At the same time, Machine Learning techniques have been widely applied to practically every field of human knowledge starting from approximately ten years ago, when essentially two factors contributed to the so-called rebirth of Deep Learning: the availability of larger datasets, putting us in the era of Big Data, and the improvement of computational power. However, the possibility to apply Neural Networks to chaotic systems have been widely debated, since these models are very data hungry and rely thus on the availability of large datasets, whereas often Climate data are rare and sparse. Moreover, chaotic dynamics should not rely much on past statistics, which these models are built on. In this thesis, we explore the possibility to study dynamical systems, seen as simple proxies of Climate models, by using Neural Networks, possibly adding prior knowledge on the underlying physical processes in the spirit of Physics Informed Neural Networks, aiming to the reconstruction of the Weather (short term dynamics) and Climate (long term dynamics) of these dynamical systems as well as the estimation of unknown parameters from Data
    corecore