41 research outputs found
Recommended from our members
Cryptographic approaches to security and optimization in machine learning
Modern machine learning techniques have achieved surprisingly good standard test accuracy, yet classical machine learning theory has been unable to explain the underlying reason behind this success. The phenomenon of adversarial examples further complicates our understanding of what it means to have good generalization ability. Classifiers that generalize well to the test set are easily fooled by imperceptible image modifications, which can often be computed without knowledge of the classifier itself. The adversarial error of a classifier measures the error under which each test data point can be modified by an algorithm before it is given as input to the classifier. Followup work has showed that a tradeoff exists between optimizing for standard generalization error versus for adversarial error. This calls into question whether standard generalization error is the correct metric to measure.
We try to understand the generalization capability of modern machine learning techniques through the lens of adversarial examples. To reconcile the apparent tradeoff between the two competing notions of error, we create new security definitions and classifier constructions which allow us to prove an upper bound on the adversarial error that decreases as standard test error decreases. We introduce a cryptographic proof technique by defining a security assumption in a simpler attack setting and proving a security reduction from a restricted black-box attack problem to this security assumption. We then investigate the double descent curve in the interpolation regime, where test error can continue to decrease even after training error has reached zero, to give a natural explanation for the observed tradeoff between adversarial error and standard generalization error.
The second part of our work investigates further this notion of a black-box model by looking at the separation between being able to evaluate a function and being able to actually understand it. This is formalized through the notion of function obfuscation in cryptography. Given some concrete implementation of a function, the implementation is considered obfuscated if a user cannot produce the function output on a test input without querying the implementation itself. This means that a user cannot actually learn or understand the function even though all of the implementation details are presented in the clear. As expected this is a very strong requirement that does not exist for all functions one might be interested in. In our work we make progress on providing obfuscation schemes for simple, explicit function classes.
The last part of our work investigates non-statistical biases and algorithms for nonconvex optimization problems. We show that the continuous-time limit of stochastic gradient descent does not converge directly to the local optimum, but rather has a bias term which grows with the step size. We also construct novel, non-statistical algorithms for two parametric learning problems by employing lattice basis reduction techniques from cryptography
Linear algebraic techniques in theoretical computer science and population genetics
Thesis (Ph. D.)--Massachusetts Institute of Technology, Department of Mathematics, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 149-155).In this thesis, we present several algorithmic results for problems in spectral graph theory and computational biology. The first part concerns the problem of spectral sparsification. It is known that every dense graph can be approximated in a strong sense by a sparse subgraph, known as a spectral sparsifier of the graph. Furthermore, researchers have recently developed efficient algorithms for computing such approximations. We show how to make these algorithms faster, and also give a substantial improvement in space efficiency. Since sparsification is an important first step in speeding up approximation algorithms for many graph problems, our results have numerous applications. In the second part of the thesis, we consider the problem of inferring human population history from genetic data. We give an efficient and principled algorithm for using single nucleotide polymorphism (SNP) data to infer admixture history of various populations, and apply it to show that Europeans have evidence of mixture with ancient Siberians. Finally, we turn to the problem of RNA secondary structure design. In this problem, we want to find RNA sequences that fold to a given secondary structure. We propose a novel global sampling approach, based on the recently developed RNAmutants algorithm, and show that it has numerous desirable properties when compared to existing solutions. Our method can prove useful for developing the next generation of RNA design algorithms.by Alex Levin.Ph.D
Recommended from our members
Optimization Foundations of Reinforcement Learning
Reinforcement learning (RL) has attracted rapidly increasing interest in the machine learning and artificial intelligence communities in the past decade. With tremendous success already demonstrated for Game AI, RL offers great potential for applications in more complex, real world domains, for example in robotics, autonomous driving and even drug discovery. Although researchers have devoted a lot of engineering effort to deploy RL methods at scale, many state-of-the art RL techniques still seem mysterious - with limited theoretical guarantees on their behaviour in practice.
In this thesis, we focus on understanding convergence guarantees for two key ideas in reinforcement learning, namely Temporal difference learning and policy gradient methods, from an optimization perspective. In Chapter 2, we provide a simple and explicit finite time analysis of Temporal difference (TD) learning with linear function approximation. Except for a few key insights, our analysis mirrors standard techniques for analyzing stochastic gradient descent algorithms, and therefore inherits the simplicity and elegance of that literature. Our convergence results extend seamlessly to the study of TD learning with eligibility traces, known as TD(λ), and to Q-learning for a class of high-dimensional optimal stopping problems.
In Chapter 3, we turn our attention to policy gradient methods and present a simple and general understanding of their global convergence properties. The main challenge here is that even for simple control problems, policy gradient algorithms face non-convex optimization problems and are widely understood to converge only to a stationary point of the objective. We identify structural properties -- shared by finite MDPs and several classic control problems -- which guarantee that despite non-convexity, any stationary point of the policy gradient objective is globally optimal. In the final chapter, we extend our analysis for finite MDPs to show linear convergence guarantees for many popular variants of policy gradient methods like projected policy gradient, Frank-Wolfe, mirror descent and natural policy gradients
Motion-capture-based hand gesture recognition for computing and control
This dissertation focuses on the study and development of algorithms that enable the analysis and recognition of hand gestures in a motion capture environment. Central to this work is the study of unlabeled point sets in a more abstract sense. Evaluations of proposed methods focus on examining their generalization to users not encountered during system training.
In an initial exploratory study, we compare various classification algorithms based upon multiple interpretations and feature transformations of point sets, including those based upon aggregate features (e.g. mean) and a pseudo-rasterization of the capture space. We find aggregate feature classifiers to be balanced across multiple users but relatively limited in maximum achievable accuracy. Certain classifiers based upon the pseudo-rasterization performed best among tested classification algorithms. We follow this study with targeted examinations of certain subproblems.
For the first subproblem, we introduce the a fortiori expectation-maximization (AFEM) algorithm for computing the parameters of a distribution from which unlabeled, correlated point sets are presumed to be generated. Each unlabeled point is assumed to correspond to a target with independent probability of appearance but correlated positions. We propose replacing the expectation phase of the algorithm with a Kalman filter modified within a Bayesian framework to account for the unknown point labels which manifest as uncertain measurement matrices. We also propose a mechanism to reorder the measurements in order to improve parameter estimates. In addition, we use a state-of-the-art Markov chain Monte Carlo sampler to efficiently sample measurement matrices. In the process, we indirectly propose a constrained k-means clustering algorithm. Simulations verify the utility of AFEM against a traditional expectation-maximization algorithm in a variety of scenarios.
In the second subproblem, we consider the application of positive definite kernels and the earth mover\u27s distance (END) to our work. Positive definite kernels are an important tool in machine learning that enable efficient solutions to otherwise difficult or intractable problems by implicitly linearizing the problem geometry. We develop a set-theoretic interpretation of ENID and propose earth mover\u27s intersection (EMI). a positive definite analog to ENID. We offer proof of EMD\u27s negative definiteness and provide necessary and sufficient conditions for ENID to be conditionally negative definite, including approximations that guarantee negative definiteness. In particular, we show that ENID is related to various min-like kernels. We also present a positive definite preserving transformation that can be applied to any kernel and can be used to derive positive definite EMD-based kernels, and we show that the Jaccard index is simply the result of this transformation applied to set intersection. Finally, we evaluate kernels based on EMI and the proposed transformation versus ENID in various computer vision tasks and show that END is generally inferior even with indefinite kernel techniques.
Finally, we apply deep learning to our problem. We propose neural network architectures for hand posture and gesture recognition from unlabeled marker sets in a coordinate system local to the hand. As a means of ensuring data integrity, we also propose an extended Kalman filter for tracking the rigid pattern of markers on which the local coordinate system is based. We consider fixed- and variable-size architectures including convolutional and recurrent neural networks that accept unlabeled marker input. We also consider a data-driven approach to labeling markers with a neural network and a collection of Kalman filters. Experimental evaluations with posture and gesture datasets show promising results for the proposed architectures with unlabeled markers, which outperform the alternative data-driven labeling method
Computational analysis of real-time convex optimization for control systems
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2000.Includes bibliographical references (p. 177-189).Computational analysis is fundamental for certification of all real-time control software. Nevertheless, analysis of on-line optimization for control has received little attention to date. On-line software must pass rigorous standards in reliability, requiring that any embedded optimization algorithm possess predictable behavior and bounded run-time guarantees. This thesis examines the problem of certifying control systems which utilize real-time optimization. A general convex programming framework is used, to which primal-dual path-following algorithms are applied. The set of all optimization problem instances which may arise in an on-line procedure is characterized as a compact parametric set of convex programming problems. A method is given for checking the feasibility and well-posedness of this compact set of problems, providing certification that every problem instance has a solution and can be solved in finite time. The thesis then proposes several algorithm initialization methods, considering the fixed and time-varying constraint cases separately. Computational bounds are provided for both cases. In the event that the computational requirements cannot be met, several alternatives to on-line optimization are suggested. Of course, these alternatives must provide feasible solutions with minimal real-time computational overhead. Beyond this requirement, these methods approximate the optimal solution as well as possible. The methods explored include robust table look-up, functional approximation of the solution set, and ellipsoidal approximation of the constraint set. The final part of this thesis examines the coupled behavior of a receding horizon control scheme for constrained linear systems and real-time optimization. The driving requirement is to maintain closed-loop stability, feasibility and well-posedness of the optimal control problem, and bounded iterations for the optimization algorithm. A detailed analysis provides sufficient conditions for meeting these requirements. A realistic example of a small autonomous air vehicle is furnished, showing how a receding horizon control law using real-time optimization can be certified.by Lawrence Kent McGovern.Ph.D
Discriminative learning for structured outputs and environments
Machine learning methods have had considerable success across a wide range of applications. Much of this success is due to the flexibility of learning algorithms and their ability to tailor themselves to the requirements of the particular problem. In this thesis we examine methods that seek to exploit the underlying structure of a problem and make the best possible use of the available data. We explore the structural nature of two different problems, binary classification under the uncertainty of input relationships, and multi-label output learning of Markov networks with unknown graph structures. From the input perspective, we focus on binary classification and the problems associated with learning from limited amounts of data. In particular we pay attention to moment based methods and investigate how to deal with the uncertainty surrounding the estimate of moments using either small or noisy training samples. We present a worst-case analysis and show how the high probability bounds on the deviation of the true moments from their empirical counterparts can be used to generate a regularisation scheme that takes into consideration the relative amount of information that is available for each class. This results in a binary classification algorithm that directly minimises the worst case future misclassification rate, whilst taking into consideration the possible errors in the moment estimates. This algorithm was shown to outperform a number of traditional approaches across a range of benchmark datasets, doing particularly well when training was limited to small amounts of data. This supports the idea that we can leverage the class specific regularisation scheme and take advantage of the uncertainty of the datasets when creating a predictor. Further encouragement for this approach was provided during the high-noise experiments, predicting the directional movement of popular currency pairs, where moment based methods outperformed those using the peripheral point of the class-conditional distributions. From the output perspective, we focus on the problem of multi-label output learning over Markov networks and present a novel large margin learning method that leverages the correlation between output labels. Our approach is agnostic to the output graph structure and it simultaneously learns the intrinsic structure of the outputs, whilst finding a large margin separator. Based upon the observation that the score function over the complete output graph is given by the expectation of the score function over all spanning trees, we formulate the problem as an L1-norm multiple kernel learning problem where each spanning tree over the complete output graph gives rise to a particular instance of a kernel. We show that this approach is comparable to state-of-the-art approaches on a number of benchmark multi-label learning problems. Furthermore, we show how this method can be applied to the problem of predicting the joint movement of a group of stocks, where we not only infer the directional movement of individual stocks but also uncover insights on the input-dependent relationships between them
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum