Search CORE

161,749 research outputs found

Sparse Probit Linear Mixed Model

Author: Cunningham John P.
Kloft Marius
Lippert Christoph
Mandt Stephan
Nakajima Shinichi
Wenzel Florian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/07/2017
Field of study

Linear Mixed Models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models for linear regression, LMMs have been restricted to continuous phenotypes. We introduce the Sparse Probit Linear Mixed Model (Probit-LMM), where we generalize the LMM modeling paradigm to binary phenotypes. As a technical challenge, the model no longer possesses a closed-form likelihood function. In this paper, we present a scalable approximate inference algorithm that lets us fit the model to high-dimensional data sets. We show on three real-world examples from different domains that in the setup of binary labels, our algorithm leads to better prediction accuracies and also selects features which show less correlation with the confounding factors.Comment: Published version, 21 pages, 6 figure

arXiv.org e-Print Archive

MDC Repository

Exploring helical dynamos with machine learning

Author: Nauman Farrukh
Nättilä Joonas
Publication venue: 'EDP Sciences'
Publication date: 01/01/2019
Field of study

We use ensemble machine learning algorithms to study the evolution of magnetic fields in magnetohydrodynamic (MHD) turbulence that is helically forced. We perform direct numerical simulations of helically forced turbulence using mean field formalism, with electromotive force (EMF) modeled both as a linear and non-linear function of the mean magnetic field and current density. The form of the EMF is determined using regularized linear regression and random forests. We also compare various analytical models to the data using Bayesian inference with Markov Chain Monte Carlo (MCMC) sampling. Our results demonstrate that linear regression is largely successful at predicting the EMF and the use of more sophisticated algorithms (random forests, MCMC) do not lead to significant improvement in the fits. We conclude that the data we are looking at is effectively low dimensional and essentially linear. Finally, to encourage further exploration by the community, we provide all of our simulation data and analysis scripts as open source IPython notebooks.Comment: accepted by A&A, 11 pages, 6 figures, 3 tables, data + IPython notebooks: https://github.com/fnauman/ML_alpha

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Chalmers Research

Recommended from our members

Deposit insurance systems and bank risk

Author: Davis EP
Obasi U
Publication venue: Brunel University
Publication date: 01/01/2009
Field of study

The link from deposit insurance to bank risk taking has been widely analysed, but has been the subject of relatively little empirical work. This work contributes to the existing literature by exploring microeconomic aspects of the deposit insurance–bank risk relationship. It employs four of the five IMF core financial soundness indicators, using data from financial statements for 914 banks in 64 countries. It also disaggregates deposit insurance by individual design features. Results, generated using GMM, suggest that deposit insurance mainly affects bank risk through its relationship with profitability and asset quality. An optimal deposit insurance system might have features such as voluntary membership, no cover for foreign currency deposits, no coinsurance, be unfunded, and administered by a private sector manager with the insurance cost borne fully by the private sector

Brunel University Research Archive

Determination of the top quark mass from leptonic observables

Author: Frixione Stefano
Mitov Alexander
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We present a procedure for the determination of the mass of the top quark at the LHC based on leptonic observables in dilepton

t\bar{t}

events. Our approach utilises the shapes of kinematic distributions through their few lowest Mellin moments; it is notable for its minimal sensitivity to the modelling of long-distance effects, for not requiring the reconstruction of top quarks, and for having a competitive precision, with theory errors on the extracted top mass of the order of 0.8 GeV. A novel aspect of our work is the study of theoretical biases that might influence in a dramatic way the determination of the top mass, and which are potentially relevant to all template-based methods. We propose a comprehensive strategy that helps minimise the impact of such biases, and leads to a reliable top mass extraction at hadron colliders.Comment: 29 pages, 3 figure

arXiv.org e-Print Archive

Springer

Springer - Publisher Connector

CERN Document Server