161,749 research outputs found
Sparse Probit Linear Mixed Model
Linear Mixed Models (LMMs) are important tools in statistical genetics. When
used for feature selection, they allow to find a sparse set of genetic traits
that best predict a continuous phenotype of interest, while simultaneously
correcting for various confounding factors such as age, ethnicity and
population structure. Formulated as models for linear regression, LMMs have
been restricted to continuous phenotypes. We introduce the Sparse Probit Linear
Mixed Model (Probit-LMM), where we generalize the LMM modeling paradigm to
binary phenotypes. As a technical challenge, the model no longer possesses a
closed-form likelihood function. In this paper, we present a scalable
approximate inference algorithm that lets us fit the model to high-dimensional
data sets. We show on three real-world examples from different domains that in
the setup of binary labels, our algorithm leads to better prediction accuracies
and also selects features which show less correlation with the confounding
factors.Comment: Published version, 21 pages, 6 figure
Exploring helical dynamos with machine learning
We use ensemble machine learning algorithms to study the evolution of
magnetic fields in magnetohydrodynamic (MHD) turbulence that is helically
forced. We perform direct numerical simulations of helically forced turbulence
using mean field formalism, with electromotive force (EMF) modeled both as a
linear and non-linear function of the mean magnetic field and current density.
The form of the EMF is determined using regularized linear regression and
random forests. We also compare various analytical models to the data using
Bayesian inference with Markov Chain Monte Carlo (MCMC) sampling. Our results
demonstrate that linear regression is largely successful at predicting the EMF
and the use of more sophisticated algorithms (random forests, MCMC) do not lead
to significant improvement in the fits. We conclude that the data we are
looking at is effectively low dimensional and essentially linear. Finally, to
encourage further exploration by the community, we provide all of our
simulation data and analysis scripts as open source IPython notebooks.Comment: accepted by A&A, 11 pages, 6 figures, 3 tables, data + IPython
notebooks: https://github.com/fnauman/ML_alpha
Recommended from our members
Deposit insurance systems and bank risk
The link from deposit insurance to bank risk taking has been widely analysed, but has been the subject of relatively little empirical work. This work contributes to the existing literature by exploring microeconomic aspects of the deposit insurance–bank risk relationship. It employs four of the five IMF core financial soundness indicators, using data from financial statements for 914 banks in 64 countries. It also disaggregates deposit insurance by individual design features. Results, generated using GMM, suggest that deposit insurance mainly affects bank risk through its relationship with profitability and asset quality. An optimal deposit insurance system might have features such as voluntary membership, no cover for foreign currency deposits, no coinsurance, be unfunded, and administered by a private sector manager with the insurance cost borne fully by the private sector
Determination of the top quark mass from leptonic observables
We present a procedure for the determination of the mass of the top quark at
the LHC based on leptonic observables in dilepton events. Our
approach utilises the shapes of kinematic distributions through their few
lowest Mellin moments; it is notable for its minimal sensitivity to the
modelling of long-distance effects, for not requiring the reconstruction of top
quarks, and for having a competitive precision, with theory errors on the
extracted top mass of the order of 0.8 GeV. A novel aspect of our work is the
study of theoretical biases that might influence in a dramatic way the
determination of the top mass, and which are potentially relevant to all
template-based methods. We propose a comprehensive strategy that helps minimise
the impact of such biases, and leads to a reliable top mass extraction at
hadron colliders.Comment: 29 pages, 3 figure
- …