12,086 research outputs found
Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping
We consider the problem of estimating a sparse multi-response regression
function, with an application to expression quantitative trait locus (eQTL)
mapping, where the goal is to discover genetic variations that influence
gene-expression levels. In particular, we investigate a shrinkage technique
capable of capturing a given hierarchical structure over the responses, such as
a hierarchical clustering tree with leaf nodes for responses and internal nodes
for clusters of related responses at multiple granularity, and we seek to
leverage this structure to recover covariates relevant to each
hierarchically-defined cluster of responses. We propose a tree-guided group
lasso, or tree lasso, for estimating such structured sparsity under
multi-response regression by employing a novel penalty function constructed
from the tree. We describe a systematic weighting scheme for the overlapping
groups in the tree-penalty such that each regression coefficient is penalized
in a balanced manner despite the inhomogeneous multiplicity of group
memberships of the regression coefficients due to overlaps among groups. For
efficient optimization, we employ a smoothing proximal gradient method that was
originally developed for a general class of structured-sparsity-inducing
penalties. Using simulated and yeast data sets, we demonstrate that our method
shows a superior performance in terms of both prediction errors and recovery of
true sparsity patterns, compared to other methods for learning a
multivariate-response regression.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS549 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Training deep neural density estimators to identify mechanistic models of neural dynamics
Mechanistic modeling in neuroscience aims to explain observed phenomena in terms of underlying causes. However, determining which model parameters agree with complex and stochastic neural data presents a significant challenge. We address this challenge with a machine learning tool which uses deep neural density estimators-- trained using model simulations-- to carry out Bayesian inference and retrieve the full space of parameters compatible with raw data or selected data features. Our method is scalable in parameters and data features, and can rapidly analyze new data after initial training. We demonstrate the power and flexibility of our approach on receptive fields, ion channels, and Hodgkin-Huxley models. We also characterize the space of circuit configurations giving rise to rhythmic activity in the crustacean stomatogastric ganglion, and use these results to derive hypotheses for underlying compensation mechanisms. Our approach will help close the gap between data-driven and theory-driven models of neural dynamics
Genetic Algorithm and Multi-objective Function Optimization with the Jumping Gene(Transposon) Adaptation-A Primer
I am going to deliver a lecture on "Genetic Algorithm and Multi-objective Optimization with the Jumping Gene (Transposon ) Adaptation - A Primer".Optimization techn-iques have long been applied to problems of industrial importance.Several excellent texts1 -5 describe the vari-ous methods with examples. These usually involve a single objective function and constraints . Most real-world
engineering problems, however, require the simultaneous optimization of several objectives ( multi-objective optimization ) that cannot be compared easily with each
other (are non-commensurate), and so cannot be combined into a single , meaningful scalar objective function. An example is the maximization of the product, while mini-mizing the production of an undesirable side product. A very popular and robust technique for solving optimiza-tion problems with a single objective function is genetic algorithm (GA), also referred to as simple GA (SGA). This, is a search technique developed by Holland .It mimics the process of natural selection and natural genetics. The Darwinian principle of'survival of the fittest ' is used to obtain the optimal solution. This technique is better than calculus-based methods (both direct and indirect methods)that generally obtain the local optimum, and that may miss the global optimum . This technique does not need derivatives either. A recent adaptation of GA [non domi-nated sorting genetic algorithm' with elitism ' and the
jumping gene operator, NSGA II-JG4] has been developed to solve multi-objective function optimization problems. In this paper we describe GA and its adaptations in a manner quite suited to a beginner
High Dimensional Classification with combined Adaptive Sparse PLS and Logistic Regression
Motivation: The high dimensionality of genomic data calls for the development
of specific classification methodologies, especially to prevent over-optimistic
predictions. This challenge can be tackled by compression and variable
selection, which combined constitute a powerful framework for classification,
as well as data visualization and interpretation. However, current proposed
combinations lead to instable and non convergent methods due to inappropriate
computational frameworks. We hereby propose a stable and convergent approach
for classification in high dimensional based on sparse Partial Least Squares
(sparse PLS). Results: We start by proposing a new solution for the sparse PLS
problem that is based on proximal operators for the case of univariate
responses. Then we develop an adaptive version of the sparse PLS for
classification, which combines iterative optimization of logistic regression
and sparse PLS to ensure convergence and stability. Our results are confirmed
on synthetic and experimental data. In particular we show how crucial
convergence and stability can be when cross-validation is involved for
calibration purposes. Using gene expression data we explore the prediction of
breast cancer relapse. We also propose a multicategorial version of our method
on the prediction of cell-types based on single-cell expression data.
Availability: Our approach is implemented in the plsgenomics R-package.Comment: 9 pages, 3 figures, 4 tables + Supplementary Materials 8 pages, 3
figures, 10 table
- …