1,354 research outputs found
Mining Pure, Strict Epistatic Interactions from High-Dimensional Datasets: Ameliorating the Curse of Dimensionality
Background: The interaction between loci to affect phenotype is called epistasis. It is strict epistasis if no proper subset of the interacting loci exhibits a marginal effect. For many diseases, it is likely that unknown epistatic interactions affect disease susceptibility. A difficulty when mining epistatic interactions from high-dimensional datasets concerns the curse of dimensionality. There are too many combinations of SNPs to perform an exhaustive search. A method that could locate strict epistasis without an exhaustive search can be considered the brass ring of methods for analyzing high-dimensional datasets. Methodology/Findings: A SNP pattern is a Bayesian network representing SNP-disease relationships. The Bayesian score for a SNP pattern is the probability of the data given the pattern, and has been used to learn SNP patterns. We identified a bound for the score of a SNP pattern. The bound provides an upper limit on the Bayesian score of any pattern that could be obtained by expanding a given pattern. We felt that the bound might enable the data to say something about the promise of expanding a 1-SNP pattern even when there are no marginal effects. We tested the bound using simulated datasets and semi-synthetic high-dimensional datasets obtained from GWAS datasets. We found that the bound was able to dramatically reduce the search time for strict epistasis. Using an Alzheimer's dataset, we showed that it is possible to discover an interaction involving the APOE gene based on its score because of its large marginal effect, but that the bound is most effective at discovering interactions without marginal effects. Conclusions/Significance: We conclude that the bound appears to ameliorate the curse of dimensionality in high-dimensional datasets. This is a very consequential result and could be pivotal in our efforts to reveal the dark matter of genetic disease risk from high-dimensional datasets. © 2012 Jiang, Neapolitan
Increment entropy as a measure of complexity for time series
Entropy has been a common index to quantify the complexity of time series in
a variety of fields. Here, we introduce increment entropy to measure the
complexity of time series in which each increment is mapped into a word of two
letters, one letter corresponding to direction and the other corresponding to
magnitude. The Shannon entropy of the words is termed as increment entropy
(IncrEn). Simulations on synthetic data and tests on epileptic EEG signals have
demonstrated its ability of detecting the abrupt change, regardless of
energetic (e.g. spikes or bursts) or structural changes. The computation of
IncrEn does not make any assumption on time series and it can be applicable to
arbitrary real-world data.Comment: 12pages,7figure,2 table
Testing Serial Independence of Object-Valued Time Series
We propose a novel method for testing serial independence of object-valued
time series in metric spaces, which is more general than Euclidean or Hilbert
spaces. The proposed method is fully nonparametric, free of tuning parameters,
and can capture all nonlinear pairwise dependence. The key concept used in this
paper is the distance covariance in metric spaces, which is extended to auto
distance covariance for object-valued time series. Furthermore, we propose a
generalized spectral density function to account for pairwise dependence at all
lags and construct a Cramer-von Mises type test statistic. New theoretical
arguments are developed to establish the asymptotic behavior of the test
statistic. A wild bootstrap is also introduced to obtain the critical values of
the non-pivotal limiting null distribution. Extensive numerical simulations and
two real data applications are conducted to illustrate the effectiveness and
versatility of our proposed method
Two-Sample and Change-Point Inference for Non-Euclidean Valued Time Series
Data objects taking value in a general metric space have become increasingly
common in modern data analysis. In this paper, we study two important
statistical inference problems, namely, two-sample testing and change-point
detection, for such non-Euclidean data under temporal dependence. Typical
examples of non-Euclidean valued time series include yearly mortality
distributions, time-varying networks, and covariance matrix time series. To
accommodate unknown temporal dependence, we advance the self-normalization (SN)
technique (Shao, 2010) to the inference of non-Euclidean time series, which is
substantially different from the existing SN-based inference for functional
time series that reside in Hilbert space (Zhang et al., 2011). Theoretically,
we propose new regularity conditions that could be easier to check than those
in the recent literature, and derive the limiting distributions of the proposed
test statistics under both null and local alternatives. For change-point
detection problem, we also derive the consistency for the change-point location
estimator, and combine our proposed change-point test with wild binary
segmentation to perform multiple change-point estimation. Numerical simulations
demonstrate the effectiveness and robustness of our proposed tests compared
with existing methods in the literature. Finally, we apply our tests to
two-sample inference in mortality data and change-point detection in
cryptocurrency data
Testing the martingale difference hypothesis in high dimension
In this paper, we consider testing the martingale difference hypothesis for
high-dimensional time series. Our test is built on the sum of squares of the
element-wise max-norm of the proposed matrix-valued nonlinear dependence
measure at different lags. To conduct the inference, we approximate the null
distribution of our test statistic by Gaussian approximation and provide a
simulation-based approach to generate critical values. The asymptotic behavior
of the test statistic under the alternative is also studied. Our approach is
nonparametric as the null hypothesis only assumes the time series concerned is
martingale difference without specifying any parametric forms of its
conditional moments. As an advantage of Gaussian approximation, our test is
robust to the cross-series dependence of unknown magnitude. To the best of our
knowledge, this is the first valid test for the martingale difference
hypothesis that not only allows for large dimension but also captures nonlinear
serial dependence. The practical usefulness of our test is illustrated via
simulation and a real data analysis. The test is implemented in a user-friendly
R-function
- …