18,935 research outputs found

    Learning non-Markovian Decision-Making from State-only Sequences

    Full text link
    Conventional imitation learning assumes access to the actions of demonstrators, but these motor signals are often non-observable in naturalistic settings. Additionally, sequential decision-making behaviors in these settings can deviate from the assumptions of a standard Markov Decision Process (MDP). To address these challenges, we explore deep generative modeling of state-only sequences with non-Markov Decision Process (nMDP), where the policy is an energy-based prior in the latent space of the state transition generator. We develop maximum likelihood estimation to achieve model-based imitation, which involves short-run MCMC sampling from the prior and importance sampling for the posterior. The learned model enables \textit{decision-making as inference}: model-free policy execution is equivalent to prior sampling, model-based planning is posterior sampling initialized from the policy. We demonstrate the efficacy of the proposed method in a prototypical path planning task with non-Markovian constraints and show that the learned model exhibits strong performances in challenging domains from the MuJoCo suite

    Machine learning in solar physics

    Full text link
    The application of machine learning in solar physics has the potential to greatly enhance our understanding of the complex processes that take place in the atmosphere of the Sun. By using techniques such as deep learning, we are now in the position to analyze large amounts of data from solar observations and identify patterns and trends that may not have been apparent using traditional methods. This can help us improve our understanding of explosive events like solar flares, which can have a strong effect on the Earth environment. Predicting hazardous events on Earth becomes crucial for our technological society. Machine learning can also improve our understanding of the inner workings of the sun itself by allowing us to go deeper into the data and to propose more complex models to explain them. Additionally, the use of machine learning can help to automate the analysis of solar data, reducing the need for manual labor and increasing the efficiency of research in this field.Comment: 100 pages, 13 figures, 286 references, accepted for publication as a Living Review in Solar Physics (LRSP

    Metric perturbations of Kerr spacetime in Lorenz gauge: Circular equatorial orbits

    Full text link
    We construct the metric perturbation in Lorenz gauge for a compact body on a circular equatorial orbit of a rotating black hole (Kerr) spacetime, using a newly-developed method of separation of variables. The metric perturbation is formed from a linear sum of differential operators acting on Teukolsky mode functions, and certain auxiliary scalars, which are solutions to ordinary differential equations in the frequency domain. For radiative modes, the solution is uniquely determined by the s=±2s=\pm2 Weyl scalars, the s=0s=0 trace, and s=0,1s=0,1 gauge scalars whose amplitudes are determined by imposing continuity conditions on the metric perturbation at the orbital radius. The static (zero-frequency) part of the metric perturbation, which is handled separately, also includes mass and angular momentum completion pieces. The metric perturbation is validated against the independent results of a 2+1D time domain code, and we demonstrate agreement at the expected level in all components, and the absence of gauge discontinuities. In principle, the new method can be used to determine the Lorenz-gauge metric perturbation at a sufficiently high precision to enable accurate second-order self-force calculations on Kerr spacetime in future. We conclude with a discussion of extensions of the method to eccentric and non-equatorial orbits.Comment: 88 pages, 14 figure

    Implicit Loss of Surjectivity and Facial Reduction: Theory and Applications

    Get PDF
    Facial reduction, pioneered by Borwein and Wolkowicz, is a preprocessing method that is commonly used to obtain strict feasibility in the reformulated, reduced constraint system. The importance of strict feasibility is often addressed in the context of the convergence results for interior point methods. Beyond the theoretical properties that the facial reduction conveys, we show that facial reduction, not only limited to interior point methods, leads to strong numerical performances in different classes of algorithms. In this thesis we study various consequences and the broad applicability of facial reduction. The thesis is organized in two parts. In the first part, we show the instabilities accompanied by the absence of strict feasibility through the lens of facially reduced systems. In particular, we exploit the implicit redundancies, revealed by each nontrivial facial reduction step, resulting in the implicit loss of surjectivity. This leads to the two-step facial reduction and two novel related notions of singularity. For the area of semidefinite programming, we use these singularities to strengthen a known bound on the solution rank, the Barvinok-Pataki bound. For the area of linear programming, we reveal degeneracies caused by the implicit redundancies. Furthermore, we propose a preprocessing tool that uses the simplex method. In the second part of this thesis, we continue with the semidefinite programs that do not have strictly feasible points. We focus on the doubly-nonnegative relaxation of the binary quadratic program and a semidefinite program with a nonlinear objective function. We closely work with two classes of algorithms, the splitting method and the Gauss-Newton interior point method. We elaborate on the advantages in building models from facial reduction. Moreover, we develop algorithms for real-world problems including the quadratic assignment problem, the protein side-chain positioning problem, and the key rate computation for quantum key distribution. Facial reduction continues to play an important role for providing robust reformulated models in both the theoretical and the practical aspects, resulting in successful numerical performances

    The Inductive Bias of Flatness Regularization for Deep Matrix Factorization

    Full text link
    Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-loss solutions. More explicit forms of flatness regularization also empirically improve the generalization performance. However, it remains unclear why and when flatness regularization leads to better generalization. This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in an important setting: learning deep linear networks from linear measurements, also known as \emph{deep matrix factorization}. We show that for all depth greater than one, with the standard Restricted Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters (i.e., the product of all layer matrices), which in turn leads to better generalization. We empirically verify our theoretical findings on synthetic datasets

    A study of BPS and near-BPS black holes via AdS/CFT

    Get PDF
    In the settings of various AdS/CFT dual pairs, we use results from supersymmetric localiza tion to gain insights into the physics of asymptotically-AdS, BPS black holes in 5 dimensions, and near-BPS black holes in 4 dimensions. We first begin with BPS black holes embedded in the known examples of AdS5/CFT4 dualities. Using the Bethe Ansatz formulation, we compute the superconformal index at large N with arbitrary chemical potentials for all charges and angular momenta, for general N = 1 four-dimensional conformal theories with a holographic dual. We conjecture and bring some evidence that a particular universal contribution to the sum over Bethe vacua dominates the index at large N. For N = 4 SYM, this contribution correctly leads to the entropy of BPS Kerr-Newman black holes in AdS5 × S 5 for arbitrary values of the conserved charges, thus completing the microscopic derivation of their microstates. We also consider theories dual to AdS5 × SE5, where SE5 is a Sasaki-Einstein manifold. We first check our results against the so-called universal black hole. We then explicitly construct the near-horizon geometry of BPS Kerr-Newman black holes in AdS5 × T 1,1 , charged under the baryonic symmetry of the conifold theory and with equal angular momenta. We compute the entropy of these black holes using the attractor mechanism and find complete agreement with field theory predictions. Next, we consider the 3d Chern-Simons matter theory that is holographically dual to massive Type IIA string theory on AdS4 × S 6 . By Kaluza-Klein reducing on S 2 with a background that is dual to the asymptotics of static dyonic BPS black holes in AdS4, we construct a N = 2 supersymmetric gauged quantum mechanics whose ground-state degener acy reproduces the entropy of BPS black holes. We expect its low-lying spectrum to contain information about near-extremal horizons. Interestingly, the model has a large number of statistically-distributed couplings, reminiscent of SYK models

    Modular lifelong machine learning

    Get PDF
    Deep learning has drastically improved the state-of-the-art in many important fields, including computer vision and natural language processing (LeCun et al., 2015). However, it is expensive to train a deep neural network on a machine learning problem. The overall training cost further increases when one wants to solve additional problems. Lifelong machine learning (LML) develops algorithms that aim to efficiently learn to solve a sequence of problems, which become available one at a time. New problems are solved with less resources by transferring previously learned knowledge. At the same time, an LML algorithm needs to retain good performance on all encountered problems, thus avoiding catastrophic forgetting. Current approaches do not possess all the desired properties of an LML algorithm. First, they primarily focus on preventing catastrophic forgetting (Diaz-Rodriguez et al., 2018; Delange et al., 2021). As a result, they neglect some knowledge transfer properties. Furthermore, they assume that all problems in a sequence share the same input space. Finally, scaling these methods to a large sequence of problems remains a challenge. Modular approaches to deep learning decompose a deep neural network into sub-networks, referred to as modules. Each module can then be trained to perform an atomic transformation, specialised in processing a distinct subset of inputs. This modular approach to storing knowledge makes it easy to only reuse the subset of modules which are useful for the task at hand. This thesis introduces a line of research which demonstrates the merits of a modular approach to lifelong machine learning, and its ability to address the aforementioned shortcomings of other methods. Compared to previous work, we show that a modular approach can be used to achieve more LML properties than previously demonstrated. Furthermore, we develop tools which allow modular LML algorithms to scale in order to retain said properties on longer sequences of problems. First, we introduce HOUDINI, a neurosymbolic framework for modular LML. HOUDINI represents modular deep neural networks as functional programs and accumulates a library of pre-trained modules over a sequence of problems. Given a new problem, we use program synthesis to select a suitable neural architecture, as well as a high-performing combination of pre-trained and new modules. We show that our approach has most of the properties desired from an LML algorithm. Notably, it can perform forward transfer, avoid negative transfer and prevent catastrophic forgetting, even across problems with disparate input domains and problems which require different neural architectures. Second, we produce a modular LML algorithm which retains the properties of HOUDINI but can also scale to longer sequences of problems. To this end, we fix the choice of a neural architecture and introduce a probabilistic search framework, PICLE, for searching through different module combinations. To apply PICLE, we introduce two probabilistic models over neural modules which allows us to efficiently identify promising module combinations. Third, we phrase the search over module combinations in modular LML as black-box optimisation, which allows one to make use of methods from the setting of hyperparameter optimisation (HPO). We then develop a new HPO method which marries a multi-fidelity approach with model-based optimisation. We demonstrate that this leads to improvement in anytime performance in the HPO setting and discuss how this can in turn be used to augment modular LML methods. Overall, this thesis identifies a number of important LML properties, which have not all been attained in past methods, and presents an LML algorithm which can achieve all of them, apart from backward transfer

    Equivariance with Learned Canonicalization Functions

    Full text link
    Symmetry-based neural networks often constrain the architecture in order to achieve invariance or equivariance to a group of transformations. In this paper, we propose an alternative that avoids this architectural constraint by learning to produce canonical representations of the data. These canonicalization functions can readily be plugged into non-equivariant backbone architectures. We offer explicit ways to implement them for some groups of interest. We show that this approach enjoys universality while providing interpretable insights. Our main hypothesis, supported by our empirical results, is that learning a small neural network to perform canonicalization is better than using predefined heuristics. Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks, including image classification, NN-body dynamics prediction, point cloud classification and part segmentation, while being faster across the board.Comment: 21 pages, 5 figure

    Equivalences between 2D dilaton gravities, their asymptotic symmetries, and their holographic duals

    Full text link
    Dilaton gravities in two dimensions can be formulated as particular Poisson sigma models. Target space diffeomorphisms map different models to each other and establish a one-to-one correspondence between their classical solutions. We obtain a general form of such diffeomorphisms in Lorentzian and Euclidean signatures and use them to extend known holographic results, including the Schwarzian action on the asymptotic boundary, from JT to a large class of dilaton gravity models.Comment: 53 pp, 4 fig

    Search-time Efficient Device Constraints-Aware Neural Architecture Search

    Full text link
    Edge computing aims to enable edge devices, such as IoT devices, to process data locally instead of relying on the cloud. However, deep learning techniques like computer vision and natural language processing can be computationally expensive and memory-intensive. Creating manual architectures specialized for each device is infeasible due to their varying memory and computational constraints. To address these concerns, we automate the construction of task-specific deep learning architectures optimized for device constraints through Neural Architecture Search (NAS). We present DCA-NAS, a principled method of fast neural network architecture search that incorporates edge-device constraints such as model size and floating-point operations. It incorporates weight sharing and channel bottleneck techniques to speed up the search time. Based on our experiments, we see that DCA-NAS outperforms manual architectures for similar sized models and is comparable to popular mobile architectures on various image classification datasets like CIFAR-10, CIFAR-100, and Imagenet-1k. Experiments with search spaces -- DARTS and NAS-Bench-201 show the generalization capabilities of DCA-NAS. On further evaluating our approach on Hardware-NAS-Bench, device-specific architectures with low inference latency and state-of-the-art performance were discovered.Comment: Accepted to 10th International Conference on Pattern Recognition and Machine Intelligence (PReMI) 202
    • …
    corecore