161,749 research outputs found

    Sparse Probit Linear Mixed Model

    Full text link
    Linear Mixed Models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models for linear regression, LMMs have been restricted to continuous phenotypes. We introduce the Sparse Probit Linear Mixed Model (Probit-LMM), where we generalize the LMM modeling paradigm to binary phenotypes. As a technical challenge, the model no longer possesses a closed-form likelihood function. In this paper, we present a scalable approximate inference algorithm that lets us fit the model to high-dimensional data sets. We show on three real-world examples from different domains that in the setup of binary labels, our algorithm leads to better prediction accuracies and also selects features which show less correlation with the confounding factors.Comment: Published version, 21 pages, 6 figure

    Exploring helical dynamos with machine learning

    Full text link
    We use ensemble machine learning algorithms to study the evolution of magnetic fields in magnetohydrodynamic (MHD) turbulence that is helically forced. We perform direct numerical simulations of helically forced turbulence using mean field formalism, with electromotive force (EMF) modeled both as a linear and non-linear function of the mean magnetic field and current density. The form of the EMF is determined using regularized linear regression and random forests. We also compare various analytical models to the data using Bayesian inference with Markov Chain Monte Carlo (MCMC) sampling. Our results demonstrate that linear regression is largely successful at predicting the EMF and the use of more sophisticated algorithms (random forests, MCMC) do not lead to significant improvement in the fits. We conclude that the data we are looking at is effectively low dimensional and essentially linear. Finally, to encourage further exploration by the community, we provide all of our simulation data and analysis scripts as open source IPython notebooks.Comment: accepted by A&A, 11 pages, 6 figures, 3 tables, data + IPython notebooks: https://github.com/fnauman/ML_alpha

    Determination of the top quark mass from leptonic observables

    Get PDF
    We present a procedure for the determination of the mass of the top quark at the LHC based on leptonic observables in dilepton ttˉt\bar{t} events. Our approach utilises the shapes of kinematic distributions through their few lowest Mellin moments; it is notable for its minimal sensitivity to the modelling of long-distance effects, for not requiring the reconstruction of top quarks, and for having a competitive precision, with theory errors on the extracted top mass of the order of 0.8 GeV. A novel aspect of our work is the study of theoretical biases that might influence in a dramatic way the determination of the top mass, and which are potentially relevant to all template-based methods. We propose a comprehensive strategy that helps minimise the impact of such biases, and leads to a reliable top mass extraction at hadron colliders.Comment: 29 pages, 3 figure
    • …
    corecore