97,412 research outputs found
Efficient regularized isotonic regression with application to gene--gene interaction search
Isotonic regression is a nonparametric approach for fitting monotonic models
to data that has been widely studied from both theoretical and practical
perspectives. However, this approach encounters computational and statistical
overfitting issues in higher dimensions. To address both concerns, we present
an algorithm, which we term Isotonic Recursive Partitioning (IRP), for isotonic
regression based on recursively partitioning the covariate space through
solution of progressively smaller "best cut" subproblems. This creates a
regularized sequence of isotonic models of increasing model complexity that
converges to the global isotonic regression solution. The models along the
sequence are often more accurate than the unregularized isotonic regression
model because of the complexity control they offer. We quantify this complexity
control through estimation of degrees of freedom along the path. Success of the
regularized models in prediction and IRPs favorable computational properties
are demonstrated through a series of simulated and real data experiments. We
discuss application of IRP to the problem of searching for gene--gene
interactions and epistasis, and demonstrate it on data from genome-wide
association studies of three common diseases.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS504 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
BSP-fields: An Exact Representation of Polygonal Objects by Differentiable Scalar Fields Based on Binary Space Partitioning
The problem considered in this work is to find a dimension independent algorithm for the generation of signed scalar fields exactly representing polygonal objects and satisfying the following requirements: the defining real function takes zero value exactly at the polygonal object boundary; no extra zero-value isosurfaces should be generated; C1 continuity of the function in the entire domain. The proposed algorithms are based on the binary space partitioning (BSP) of the object by the planes passing through the polygonal faces and are independent of the object genus, the number of disjoint components, and holes in the initial polygonal mesh. Several extensions to the basic algorithm are proposed to satisfy the selected optimization criteria. The generated BSP-fields allow for applying techniques of the function-based modeling to already existing legacy objects from CAD and computer animation areas, which is illustrated by several examples
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis
In recent years, ideas from statistics and scientific computing have begun to
interact in increasingly sophisticated and fruitful ways with ideas from
computer science and the theory of algorithms to aid in the development of
improved worst-case algorithms that are useful for large-scale scientific and
Internet data analysis problems. In this chapter, I will describe two recent
examples---one having to do with selecting good columns or features from a (DNA
Single Nucleotide Polymorphism) data matrix, and the other having to do with
selecting good clusters or communities from a data graph (representing a social
or information network)---that drew on ideas from both areas and that may serve
as a model for exploiting complementary algorithmic and statistical
perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors,
"Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
- …