Search CORE

246,720 research outputs found

Nearest Neighbor Methods for Testing Reflexivity and Species-Correspondence

Author: Ceyhan Elvan
Publication venue
Publication date: 14/05/2014
Field of study

Nearest neighbor (NN) methods are employed for drawing inferences about spatial patterns of points from two or more classes. We consider Pielou's test of niche specificity which is defined using a contingency table based on the NN relationships between the data points. We demonstrate that Pielou's contingency table for niche specificity is actually more appropriate for testing reflexivity in NN structure, hence we call this table as NN reflexivity contingency table (NN-RCT) henceforth. We also derive an asymptotic approximation for the distribution of the entries of the NN-RCT and consider variants of Fisher's exact test on it. Moreover, we introduce a new test of class- or species-correspondence inspired by spatial niche/habitat specificity and the associated contingency table called species-correspondence contingency table (SCCT). We also determine the appropriate null hypotheses and the underlying conditions appropriate for these tests. We investigate the finite sample performance of the tests in terms of empirical size and power by extensive Monte Carlo simulations and the methods are illustrated on a real-life ecological data set.Comment: 23 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Quantifying dependencies for sensitivity analysis with multivariate input sample data

Author: Crommelin D. T.
Eggels A. W.
Publication venue: 'MDPI AG'
Publication date: 06/02/2018
Field of study

We present a novel method for quantifying dependencies in multivariate datasets, based on estimating the R\'{e}nyi entropy by minimum spanning trees (MSTs). The length of the MSTs can be used to order pairs of variables from strongly to weakly dependent, making it a useful tool for sensitivity analysis with dependent input variables. It is well-suited for cases where the input distribution is unknown and only a sample of the inputs is available. We introduce an estimator to quantify dependency based on the MST length, and investigate its properties with several numerical examples. To reduce the computational cost of constructing the exact MST for large datasets, we explore methods to compute approximations to the exact MST, and find the multilevel approach introduced recently by Zhong et al. (2015) to be the most accurate. We apply our proposed method to an artificial testcase based on the Ishigami function, as well as to a real-world testcase involving sediment transport in the North Sea. The results are consistent with prior knowledge and heuristic understanding, as well as with variance-based analysis using Sobol indices in the case where these indices can be computed

arXiv.org e-Print Archive

CWI's Institutional Repository

Testing serial independence using the sample distribution function

Author: Delgado Miguel A.
Publication venue
Publication date: 01/09/1993
Field of study

This paper presents and discusses a nonparametric test for detecting serial dependence. We consider a Cramèr-v.Mises statistic based on the difference between the joint sample distribution and the product of the marginals. Exact critical values can be approximated from the asymptotic null distribution or by resampling, randomly permuting the original series. The approximation based on resampling is more accurate and the corresponding test enjoys, like other bootstrap based procedures, excellent level accuracy, with level error of order T-3/2. A Monte Carlo experiment illustrates the test performance with small and moderate sample sizes. The paper also includes an application, testing the random walk hypothesis of exchange rate returns for several currencies

Universidad Carlos III de Madrid e-Archivo

Recommended from our members

Testing Downside Risk Efficiency Under Market Distress

Author: Gonzalo J.
Olmo J.
Publication venue: Department of Economics, City University London
Publication date: 01/01/2008
Field of study

In moments of distress downside risk measures like Lower Partial Moments (LPM) are more appropriate than the standard variance to characterize risk. The goal of this paper is to study how to compare portfolios in these situations. In order to do that we show the close connection between mean-risk effciency sets and stochastic dominance under distress episodes of the market, and use the latter property to propose a hypothesis test to discriminate between portfolios across risk aversion levels. Our novel family of test statistics for testing stochastic dominance under distress makes allowance for testing orders of dominance higher than zero, for general forms of dependence between portfolios and can be extended to residuals of regression models. These results are illustrated in the empirical application for data from US stocks. We show that mean-variance strategies are stochastically dominated by mean-risk efficient sets in episodes of financial distress

City Research Online

Universidad Carlos III de Madrid e-Archivo

Secretaría de Estado de Cultura

Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis

Author: Ahlquist Paul
Boon Johan A. den
Newton Michael A.
Quintana Fernando A.
Sengupta Srikumar
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 31/08/2007
Field of study

A prespecified set of genes may be enriched, to varying degrees, for genes that have altered expression levels relative to two or more states of a cell. Knowing the enrichment of gene sets defined by functional categories, such as gene ontology (GO) annotations, is valuable for analyzing the biological signals in microarray expression data. A common approach to measuring enrichment is by cross-classifying genes according to membership in a functional category and membership on a selected list of significantly altered genes. A small Fisher's exact test

p

-value, for example, in this

2\times2

table is indicative of enrichment. Other category analysis methods retain the quantitative gene-level scores and measure significance by referring a category-level statistic to a permutation distribution associated with the original differential expression problem. We describe a class of random-set scoring methods that measure distinct components of the enrichment signal. The class includes Fisher's test based on selected genes and also tests that average gene-level evidence across the category. Averaging and selection methods are compared empirically using Affymetrix data on expression in nasopharyngeal cancer tissue, and theoretically using a location model of differential expression. We find that each method has a domain of superiority in the state space of enrichment problems, and that both methods have benefits in practice. Our analysis also addresses two problems related to multiple-category inference, namely, that equally enriched categories are not detected with equal probability if they are of different sizes, and also that there is dependence among category statistics owing to shared genes. Random-set enrichment calculations do not require Monte Carlo for implementation. They are made available in the R package allez.Comment: Published at http://dx.doi.org/10.1214/07-AOAS104 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Identifying statistical dependence in genomic sequences via mutual information estimates

Author: Aktulga HM
Grama AY
Kontoyiannis I
Lyznik LA
Szpankowski L
Szpankowski W
Publication venue
Publication date: 01/01/2007
Field of study

Questions of understanding and quantifying the representation and amount of information in organisms have become a central part of biological research, as they potentially hold the key to fundamental advances. In this paper, we demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA) that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural dependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the 5' untranslated region in zmSRp32 and its alternatively spliced exons. This observation may indicate the presence of as-yet unknown alternative splicing mechanisms or structural scaffolds. Second, using data from the FBI's Combined DNA Index System (CODIS), we demonstrate that our approach is particularly well suited for the problem of discovering short tandem repeats, an application of importance in genetic profiling.Comment: Preliminary version. Final version in EURASIP Journal on Bioinformatics and Systems Biology. See http://www.hindawi.com/journals/bsb

arXiv.org e-Print Archive

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

CUED - Cambridge University Engineering Department