27,238 research outputs found
Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines
DNA copy number and mRNA expression are widely used data types in cancer
studies, which combined provide more insight than separately. Whereas in
existing literature the form of the relationship between these two types of
markers is fixed a priori, in this paper we model their association. We employ
piecewise linear regression splines (PLRS), which combine good interpretation
with sufficient flexibility to identify any plausible type of relationship. The
specification of the model leads to estimation and model selection in a
constrained, nonstandard setting. We provide methodology for testing the effect
of DNA on mRNA and choosing the appropriate model. Furthermore, we present a
novel approach to obtain reliable confidence bands for constrained PLRS, which
incorporates model uncertainty. The procedures are applied to colorectal and
breast cancer data. Common assumptions are found to be potentially misleading
for biologically relevant genes. More flexible models may bring more insight in
the interaction between the two markers.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS605 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Slope heuristics and V-Fold model selection in heteroscedastic regression using strongly localized bases
We investigate the optimality for model selection of the so-called slope
heuristics, -fold cross-validation and -fold penalization in a
heteroscedastic with random design regression context. We consider a new class
of linear models that we call strongly localized bases and that generalize
histograms, piecewise polynomials and compactly supported wavelets. We derive
sharp oracle inequalities that prove the asymptotic optimality of the slope
heuristics---when the optimal penalty shape is known---and -fold
penalization. Furthermore, -fold cross-validation seems to be suboptimal for
a fixed value of since it recovers asymptotically the oracle learned from a
sample size equal to of the original amount of data. Our results are
based on genuine concentration inequalities for the true and empirical excess
risks that are of independent interest. We show in our experiments the good
behavior of the slope heuristics for the selection of linear wavelet models.
Furthermore, -fold cross-validation and -fold penalization have
comparable efficiency
Moving sum procedure for change point detection under piecewise linearity
We propose a computationally and statistically efficient procedure for
segmenting univariate data under piecewise linearity. The proposed moving sum
(MOSUM) methodology detects multiple change points where the underlying signal
undergoes discontinuous jumps and/or slope changes. Theoretically, it controls
the family-wise error rate at a given significance level asymptotically and
achieves consistency in multiple change point detection, as well as matching
the minimax optimal rate of estimation when the signal is piecewise linear and
continuous, all under weak assumptions permitting serial dependence and
heavy-tailedness. Computationally, the complexity of the MOSUM procedure is
which, combined with its good performance on simulated datasets, making
it highly attractive in comparison with the existing methods. We further
demonstrate its good performance on a real data example on rolling
element-bearing prognostics
Moving sum procedure for change point detection under piecewise linearity
We propose a computationally and statistically efficient procedure for
segmenting univariate data under piecewise linearity. The proposed moving sum
(MOSUM) methodology detects multiple change points where the underlying signal
undergoes discontinuous jumps and/or slope changes. Theoretically, it controls
the family-wise error rate at a given significance level asymptotically and
achieves consistency in multiple change point detection, as well as matching
the minimax optimal rate of estimation when the signal is piecewise linear and
continuous, all under weak assumptions permitting serial dependence and
heavy-tailedness. Computationally, the complexity of the MOSUM procedure is
which, combined with its good performance on simulated datasets, making
it highly attractive in comparison with the existing methods. We further
demonstrate its good performance on a real data example on rolling
element-bearing prognostics
The MINI mixed finite element for the Stokes problem: An experimental investigation
Super-convergence of order 1.5 in pressure and velocity has been
experimentally investigated for the two-dimensional Stokes problem discretised
with the MINI mixed finite element. Even though the classic mixed finite
element theory for the MINI element guarantees linear convergence for the total
error, recent theoretical results indicate that super-convergence of order 1.5
in pressure and of the linear part of the computed velocity to the piecewise
linear nodal interpolation of the exact velocity is in fact possible with
structured, three-directional triangular meshes. The numerical experiments
presented here suggest a more general validity of super-convergence of order
1.5, possibly to automatically generated and unstructured triangulations. In
addition, the approximating properties of the complete computed velocity have
been compared with the approximating properties of the piecewise-linear part of
the computed velocity, finding that the former is generally closer to the exact
velocity, whereas the latter conserves mass better
Recommended from our members
Longitudinal analysis on AQI in 3 main economic zones of China
textIn modern China, air pollution has become an essential environmental problem. Over the last 2 years, the air pollution problem, as measured by PM 2.5 (particulate matter) is getting worse. My report aims to carry out a longitudinal data analysis of the air quality index (AQI) in 3 main economic zones in China. Longitudinal data, or repeated measures data, can be viewed as multilevel data with repeated measurements nested within individuals. I arrive at some conclusions about why the 3 areas have different AQI, mainly attributed to factors like population, GDP, temperature, humidity, and other factors like whether the area is inland or by the sea. The residual variance is partitioned into a between-zone component (the variance of the zone-level residuals) and a within-zone component (the variance of the city-level residuals). The zone residuals represent unobserved zone characteristics that affect AQI. In this report, the model building is mainly according to the sequence described by West et al (2007) with respect to the bottom-up procedures and the reference by Singer, J. D., & Willett, J. B (2003) which includes the non-linear situations. This report also compares the quartic curve model with piecewise growth model with respect to this data. The final model I reached is a piece wise model with time-level and zone-level predictors and also with temperature by time interactions.Statistic
- …