34 research outputs found

    Information Theory and Machine Learning

    Get PDF
    The recent successes of machine learning, especially regarding systems based on deep neural networks, have encouraged further research activities and raised a new set of challenges in understanding and designing complex machine learning algorithms. New applications require learning algorithms to be distributed, have transferable learning results, use computation resources efficiently, convergence quickly on online settings, have performance guarantees, satisfy fairness or privacy constraints, incorporate domain knowledge on model structures, etc. A new wave of developments in statistical learning theory and information theory has set out to address these challenges. This Special Issue, "Machine Learning and Information Theory", aims to collect recent results in this direction reflecting a diverse spectrum of visions and efforts to extend conventional theories and develop analysis tools for these complex machine learning systems

    Structure in Machine Learning: Graphical Models and Monte Carlo Methods

    Get PDF
    This thesis is concerned with two main areas: approximate inference in discrete graphical models, and random embeddings for dimensionality reduction and approximate inference in kernel methods. Approximate inference is a fundamental problem in machine learning and statistics, with strong connections to other domains such as theoretical computer science. At the same time, there has often been a gap between the success of many algorithms in this area in practice, and what can be explained by theory; thus, an important research effort is to bridge this gap. Random embeddings for dimensionality reduction and approximate inference have led to great improvements in scalability of a wide variety of methods in machine learning. In recent years, there has been much work on how the stochasticity introduced by these approaches can be better controlled, and what further computational improvements can be made. In the first part of this thesis, we study approximate inference algorithms for discrete graphical models. Firstly, we consider linear programming methods for approximate MAP inference, and develop our understanding of conditions for exactness of these approximations. Such guarantees of exactness are typically based on either structural restrictions on the underlying graph corresponding to the model (such as low treewidth), or restrictions on the types of potential functions that may be present in the model (such as log-supermodularity). We contribute two new classes of exactness guarantees: the first of these takes the form of particular hybrid restrictions on a combination of graph structure and potential types, whilst the second is given by excluding particular substructures from the underlying graph, via graph minor theory. We also study a particular family of transformation methods of graphical models, uprooting and rerooting, and their effect on approximate MAP and marginal inference methods. We prove new theoretical results on the behaviour of particular approximate inference methods under these transformations, in particular showing that the triplet relaxation of the marginal polytope is unique in being universally rooted. We also introduce a heuristic which quickly picks a rerooting, and demonstrate benefits empirically on models over several graph topologies. In the second part of this thesis, we study Monte Carlo methods for both linear dimensionality reduction and approximate inference in kernel machines. We prove the statistical benefit of coupling Monte Carlo samples to be almost-surely orthogonal in a variety of contexts, and study fast approximate methods of inducing this coupling. A surprising result is that these approximate methods can simultaneously offer improved statistical benefits, time complexity, and space complexity over i.i.d. Monte Carlo samples. We evaluate our methods on a variety of datasets, directly studying their effects on approximate kernel evaluation, as well as on downstream tasks such as Gaussian process regression.EPSR
    corecore