246,720 research outputs found
Nearest Neighbor Methods for Testing Reflexivity and Species-Correspondence
Nearest neighbor (NN) methods are employed for drawing inferences about
spatial patterns of points from two or more classes. We consider Pielou's test
of niche specificity which is defined using a contingency table based on the NN
relationships between the data points. We demonstrate that Pielou's contingency
table for niche specificity is actually more appropriate for testing
reflexivity in NN structure, hence we call this table as NN reflexivity
contingency table (NN-RCT) henceforth. We also derive an asymptotic
approximation for the distribution of the entries of the NN-RCT and consider
variants of Fisher's exact test on it. Moreover, we introduce a new test of
class- or species-correspondence inspired by spatial niche/habitat specificity
and the associated contingency table called species-correspondence contingency
table (SCCT). We also determine the appropriate null hypotheses and the
underlying conditions appropriate for these tests. We investigate the finite
sample performance of the tests in terms of empirical size and power by
extensive Monte Carlo simulations and the methods are illustrated on a
real-life ecological data set.Comment: 23 pages, 1 figur
Quantifying dependencies for sensitivity analysis with multivariate input sample data
We present a novel method for quantifying dependencies in multivariate
datasets, based on estimating the R\'{e}nyi entropy by minimum spanning trees
(MSTs). The length of the MSTs can be used to order pairs of variables from
strongly to weakly dependent, making it a useful tool for sensitivity analysis
with dependent input variables. It is well-suited for cases where the input
distribution is unknown and only a sample of the inputs is available. We
introduce an estimator to quantify dependency based on the MST length, and
investigate its properties with several numerical examples. To reduce the
computational cost of constructing the exact MST for large datasets, we explore
methods to compute approximations to the exact MST, and find the multilevel
approach introduced recently by Zhong et al. (2015) to be the most accurate. We
apply our proposed method to an artificial testcase based on the Ishigami
function, as well as to a real-world testcase involving sediment transport in
the North Sea. The results are consistent with prior knowledge and heuristic
understanding, as well as with variance-based analysis using Sobol indices in
the case where these indices can be computed
Testing serial independence using the sample distribution function
This paper presents and discusses a nonparametric test for detecting serial dependence. We consider a Cramèr-v.Mises statistic based on the difference between the joint sample distribution and the product of the marginals. Exact critical values can be approximated from the asymptotic null distribution or by resampling, randomly permuting the original series. The approximation based on resampling is more accurate and the corresponding test enjoys, like other bootstrap based procedures, excellent level accuracy, with level error of order T-3/2. A Monte Carlo experiment illustrates the test performance with small and moderate sample sizes. The paper also includes an application, testing the random walk hypothesis of exchange rate returns for several currencies
Recommended from our members
Testing Downside Risk Efficiency Under Market Distress
In moments of distress downside risk measures like Lower Partial Moments (LPM) are more appropriate than the standard variance to characterize risk. The goal of this paper is to study how to compare portfolios in these situations. In order to do that we show the close connection between mean-risk effciency sets and stochastic dominance under distress episodes of the market, and use the latter property to propose a hypothesis test to discriminate between portfolios across risk aversion levels. Our novel family of test statistics for testing stochastic dominance under distress makes allowance for testing orders of dominance higher than zero, for general forms of dependence between portfolios and can be extended to residuals of regression models. These results are illustrated in the empirical application for data from US stocks. We show that mean-variance strategies are stochastically dominated by mean-risk efficient sets in episodes of financial distress
Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis
A prespecified set of genes may be enriched, to varying degrees, for genes
that have altered expression levels relative to two or more states of a cell.
Knowing the enrichment of gene sets defined by functional categories, such as
gene ontology (GO) annotations, is valuable for analyzing the biological
signals in microarray expression data. A common approach to measuring
enrichment is by cross-classifying genes according to membership in a
functional category and membership on a selected list of significantly altered
genes. A small Fisher's exact test -value, for example, in this
table is indicative of enrichment. Other category analysis methods retain the
quantitative gene-level scores and measure significance by referring a
category-level statistic to a permutation distribution associated with the
original differential expression problem. We describe a class of random-set
scoring methods that measure distinct components of the enrichment signal. The
class includes Fisher's test based on selected genes and also tests that
average gene-level evidence across the category. Averaging and selection
methods are compared empirically using Affymetrix data on expression in
nasopharyngeal cancer tissue, and theoretically using a location model of
differential expression. We find that each method has a domain of superiority
in the state space of enrichment problems, and that both methods have benefits
in practice. Our analysis also addresses two problems related to
multiple-category inference, namely, that equally enriched categories are not
detected with equal probability if they are of different sizes, and also that
there is dependence among category statistics owing to shared genes. Random-set
enrichment calculations do not require Monte Carlo for implementation. They are
made available in the R package allez.Comment: Published at http://dx.doi.org/10.1214/07-AOAS104 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Identifying statistical dependence in genomic sequences via mutual information estimates
Questions of understanding and quantifying the representation and amount of
information in organisms have become a central part of biological research, as
they potentially hold the key to fundamental advances. In this paper, we
demonstrate the use of information-theoretic tools for the task of identifying
segments of biomolecules (DNA or RNA) that are statistically correlated. We
develop a precise and reliable methodology, based on the notion of mutual
information, for finding and extracting statistical as well as structural
dependencies. A simple threshold function is defined, and its use in
quantifying the level of significance of dependencies between biological
segments is explored. These tools are used in two specific applications. First,
for the identification of correlations between different parts of the maize
zmSRp32 gene. There, we find significant dependencies between the 5'
untranslated region in zmSRp32 and its alternatively spliced exons. This
observation may indicate the presence of as-yet unknown alternative splicing
mechanisms or structural scaffolds. Second, using data from the FBI's Combined
DNA Index System (CODIS), we demonstrate that our approach is particularly well
suited for the problem of discovering short tandem repeats, an application of
importance in genetic profiling.Comment: Preliminary version. Final version in EURASIP Journal on
Bioinformatics and Systems Biology. See http://www.hindawi.com/journals/bsb
- …