12,001 research outputs found
Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping
We consider the problem of estimating a sparse multi-response regression
function, with an application to expression quantitative trait locus (eQTL)
mapping, where the goal is to discover genetic variations that influence
gene-expression levels. In particular, we investigate a shrinkage technique
capable of capturing a given hierarchical structure over the responses, such as
a hierarchical clustering tree with leaf nodes for responses and internal nodes
for clusters of related responses at multiple granularity, and we seek to
leverage this structure to recover covariates relevant to each
hierarchically-defined cluster of responses. We propose a tree-guided group
lasso, or tree lasso, for estimating such structured sparsity under
multi-response regression by employing a novel penalty function constructed
from the tree. We describe a systematic weighting scheme for the overlapping
groups in the tree-penalty such that each regression coefficient is penalized
in a balanced manner despite the inhomogeneous multiplicity of group
memberships of the regression coefficients due to overlaps among groups. For
efficient optimization, we employ a smoothing proximal gradient method that was
originally developed for a general class of structured-sparsity-inducing
penalties. Using simulated and yeast data sets, we demonstrate that our method
shows a superior performance in terms of both prediction errors and recovery of
true sparsity patterns, compared to other methods for learning a
multivariate-response regression.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS549 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A survey of popular R packages for cluster analysis
Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring datasets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans and hclust functions; the mclust library; the poLCA library; and the clustMD library. The packages/functions cover a variety of cluster analysis methods for continuous data, categorical data or a collection of the two. The contrasting methods in the different packages are briefly introduced and basic usage of the functions is discussed. The use of the different methods is compared and contrasted and then illustrated on example data. In the discussion, links to information on other available libraries for different clustering methods and extensions beyond basic clustering methods are given. The code for the worked examples in Section 2 is available at http://www.stats.gla.ac.uk/~nd29c/Software/ClusterReviewCode.
Appointment scheduling model in healthcare using clustering algorithms
In this study we provided a scheduling procedure which is combination of
machine learning and mathematical programming. Outpatients who request for
appointment in healthcare facilities have different priorities. Determining the
priority of outpatients and allocating the capacity based on the priority
classes are important concepts that have to be considered in scheduling of
outpatients. Two stages are defined for scheduling an incoming patient. In the
first stage, We applied and compared different clustering methods such as
k-mean clustering and agglomerative hierarchical clustering methods to classify
outpatients into priority classes and suggested the best pattern to cluster the
outpatients. In the second stage, we modeled the scheduling problem as a Markov
Decision Process (MDP) problem that aims to decrease waiting time of higher
priority outpatients. Due to the curse of dimensionality, we used fluid
approximation method to estimate the optimal solution of the MDP. We applied
our methodology on a dataset of Shaheed Rajaei Medical and Research Center in
Iran, and we showed how our models work in prioritizing and scheduling of
outpatients
- …