44,861 research outputs found
Efficient regularized isotonic regression with application to gene--gene interaction search
Isotonic regression is a nonparametric approach for fitting monotonic models
to data that has been widely studied from both theoretical and practical
perspectives. However, this approach encounters computational and statistical
overfitting issues in higher dimensions. To address both concerns, we present
an algorithm, which we term Isotonic Recursive Partitioning (IRP), for isotonic
regression based on recursively partitioning the covariate space through
solution of progressively smaller "best cut" subproblems. This creates a
regularized sequence of isotonic models of increasing model complexity that
converges to the global isotonic regression solution. The models along the
sequence are often more accurate than the unregularized isotonic regression
model because of the complexity control they offer. We quantify this complexity
control through estimation of degrees of freedom along the path. Success of the
regularized models in prediction and IRPs favorable computational properties
are demonstrated through a series of simulated and real data experiments. We
discuss application of IRP to the problem of searching for gene--gene
interactions and epistasis, and demonstrate it on data from genome-wide
association studies of three common diseases.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS504 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Post-processing partitions to identify domains of modularity optimization
We introduce the Convex Hull of Admissible Modularity Partitions (CHAMP)
algorithm to prune and prioritize different network community structures
identified across multiple runs of possibly various computational heuristics.
Given a set of partitions, CHAMP identifies the domain of modularity
optimization for each partition ---i.e., the parameter-space domain where it
has the largest modularity relative to the input set---discarding partitions
with empty domains to obtain the subset of partitions that are "admissible"
candidate community structures that remain potentially optimal over indicated
parameter domains. Importantly, CHAMP can be used for multi-dimensional
parameter spaces, such as those for multilayer networks where one includes a
resolution parameter and interlayer coupling. Using the results from CHAMP, a
user can more appropriately select robust community structures by observing the
sizes of domains of optimization and the pairwise comparisons between
partitions in the admissible subset. We demonstrate the utility of CHAMP with
several example networks. In these examples, CHAMP focuses attention onto
pruned subsets of admissible partitions that are 20-to-1785 times smaller than
the sets of unique partitions obtained by community detection heuristics that
were input into CHAMP.Comment: http://www.mdpi.com/1999-4893/10/3/9
Nonparametric Hierarchical Clustering of Functional Data
In this paper, we deal with the problem of curves clustering. We propose a
nonparametric method which partitions the curves into clusters and discretizes
the dimensions of the curve points into intervals. The cross-product of these
partitions forms a data-grid which is obtained using a Bayesian model selection
approach while making no assumptions regarding the curves. Finally, a
post-processing technique, aiming at reducing the number of clusters in order
to improve the interpretability of the clustering, is proposed. It consists in
optimally merging the clusters step by step, which corresponds to an
agglomerative hierarchical classification whose dissimilarity measure is the
variation of the criterion. Interestingly this measure is none other than the
sum of the Kullback-Leibler divergences between clusters distributions before
and after the merges. The practical interest of the approach for functional
data exploratory analysis is presented and compared with an alternative
approach on an artificial and a real world data set
- …