205 research outputs found
Penalized Orthogonal-Components Regression for Large p Small n Data
We propose a penalized orthogonal-components regression (POCRE) for large p
small n data. Orthogonal components are sequentially constructed to maximize,
upon standardization, their correlation to the response residuals. A new
penalization framework, implemented via empirical Bayes thresholding, is
presented to effectively identify sparse predictors of each component. POCRE is
computationally efficient owing to its sequential construction of leading
sparse principal components. In addition, such construction offers other
properties such as grouping highly correlated predictors and allowing for
collinear or nearly collinear predictors. With multivariate responses, POCRE
can construct common components and thus build up latent-variable models for
large p small n data.Comment: 12 page
Penalized Orthogonal-Components Regression for Large p Small n Data
We propose a penalized orthogonal-components regression (POCRE) for large p
small n data. Orthogonal components are sequentially constructed to maximize,
upon standardization, their correlation to the response residuals. A new
penalization framework, implemented via empirical Bayes thresholding, is
presented to effectively identify sparse predictors of each component. POCRE is
computationally efficient owing to its sequential construction of leading
sparse principal components. In addition, such construction offers other
properties such as grouping highly correlated predictors and allowing for
collinear or nearly collinear predictors. With multivariate responses, POCRE
can construct common components and thus build up latent-variable models for
large p small n data.Comment: 12 page
On methods for correcting for the look-elsewhere effect in searches for new physics
The search for new significant peaks over a energy spectrum often involves a
statistical multiple hypothesis testing problem. Separate tests of hypothesis
are conducted at different locations producing an ensemble of local p-values,
the smallest of which is reported as evidence for the new resonance.
Unfortunately, controlling the false detection rate (type I error rate) of such
procedures may lead to excessively stringent acceptance criteria. In the recent
physics literature, two promising statistical tools have been proposed to
overcome these limitations. In 2005, a method to "find needles in haystacks"
was introduced by Pilla et al. [1], and a second method was later proposed by
Gross and Vitells [2] in the context of the "look elsewhere effect" and trial
factors. We show that, for relatively small sample sizes, the former leads to
an artificial inflation of statistical power that stems from an increase in the
false detection rate, whereas the two methods exhibit similar performance for
large sample sizes. We apply the methods to realistic simulations of the Fermi
Large Area Telescope data, in particular the search for dark matter
annihilation lines. Further, we discuss the counter-intutive scenario where the
look-elsewhere corrections are more conservative than much more computationally
efficient corrections for multiple hypothesis testing. Finally, we provide
general guidelines for navigating the tradeoffs between statistical and
computational efficiency when selecting a statistical procedure for signal
detection
Detecting event-related recurrences by symbolic analysis: Applications to human language processing
Quasistationarity is ubiquitous in complex dynamical systems. In brain
dynamics there is ample evidence that event-related potentials reflect such
quasistationary states. In order to detect them from time series, several
segmentation techniques have been proposed. In this study we elaborate a recent
approach for detecting quasistationary states as recurrence domains by means of
recurrence analysis and subsequent symbolisation methods. As a result,
recurrence domains are obtained as partition cells that can be further aligned
and unified for different realisations. We address two pertinent problems of
contemporary recurrence analysis and present possible solutions for them.Comment: 24 pages, 6 figures. Draft version to appear in Proc Royal Soc
Detection of Epigenomic Network Community Oncomarkers
In this paper we propose network methodology to infer prognostic cancer
biomarkers based on the epigenetic pattern DNA methylation. Epigenetic
processes such as DNA methylation reflect environmental risk factors, and are
increasingly recognised for their fundamental role in diseases such as cancer.
DNA methylation is a gene-regulatory pattern, and hence provides a means by
which to assess genomic regulatory interactions. Network models are a natural
way to represent and analyse groups of such interactions. The utility of
network models also increases as the quantity of data and number of variables
increase, making them increasingly relevant to large-scale genomic studies. We
propose methodology to infer prognostic genomic networks from a DNA
methylation-based measure of genomic interaction and association. We then show
how to identify prognostic biomarkers from such networks, which we term
`network community oncomarkers'. We illustrate the power of our proposed
methodology in the context of a large publicly available breast cancer dataset
Learning Algebraic Varieties from Samples
We seek to determine a real algebraic variety from a fixed finite subset of
points. Existing methods are studied and new methods are developed. Our focus
lies on aspects of topology and algebraic geometry, such as dimension and
defining polynomials. All algorithms are tested on a range of datasets and made
available in a Julia package
- …