Search CORE

330 research outputs found

Beyond Zipf's Law: The Lavalette Rank Function and its Properties

Author: Cocho Germinal
Fontanelli Oscar
Li Wentian
Miramontes Pedro
Yang Yaning
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

Although Zipf's law is widespread in natural and social data, one often encounters situations where one or both ends of the ranked data deviate from the power-law function. Previously we proposed the Beta rank function to improve the fitting of data which does not follow a perfect Zipf's law. Here we show that when the two parameters in the Beta rank function have the same value, the Lavalette rank function, the probability density function can be derived analytically. We also show both computationally and analytically that Lavalette distribution is approximately equal, though not identical, to the lognormal distribution. We illustrate the utility of Lavalette rank function in several datasets. We also address three analysis issues on the statistical testing of Lavalette fitting function, comparison between Zipf's law and lognormal distribution through Lavalette function, and comparison between lognormal distribution and Lavalette distribution.Comment: 15 pages, 4 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Correcting for cryptic relatedness by a regression-based genomic control method

Author: Hou Bo
Yan Ting
Yang Yaning
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Genomic control (GC) method is a useful tool to correct for the cryptic relatedness in population-based association studies. It was originally proposed for correcting for the variance inflation of Cochran-Armitage's additive trend test by using information from unlinked null markers, and was later generalized to be applicable to other tests with the additional requirement that the null markers are matched with the candidate marker in allele frequencies. However, matching allele frequencies limits the number of available null markers and thus limits the applicability of the GC method. On the other hand, errors in genotype/allele frequencies may cause further bias and variance inflation and thereby aggravate the effect of GC correction. Results In this paper, we propose a regression-based GC method using null markers that are not necessarily matched in allele frequencies with the candidate marker. Variation of allele frequencies of the null markers is adjusted by a regression method. Conclusion The proposed method can be readily applied to the Cochran-Armitage's trend tests other than the additive trend test, the Pearson's chi-square test and other robust efficiency tests. Simulation results show that the proposed method is effective in controlling type I error in the presence of population substructure.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Statistical significance for hierarchical clustering in genetic association and microarray expression studies

Author: Levenstien Mark A
Ott Jürg
Yang Yaning
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

BACKGROUND: With the increasing amount of data generated in molecular genetics laboratories, it is often difficult to make sense of results because of the vast number of different outcomes or variables studied. Examples include expression levels for large numbers of genes and haplotypes at large numbers of loci. It is then natural to group observations into smaller numbers of classes that allow for an easier overview and interpretation of the data. This grouping is often carried out in multiple steps with the aid of hierarchical cluster analysis, each step leading to a smaller number of classes by combining similar observations or classes. At each step, either implicitly or explicitly, researchers tend to interpret results and eventually focus on that set of classes providing the "best" (most significant) result. While this approach makes sense, the overall statistical significance of the experiment must include the clustering process, which modifies the grouping structure of the data and often removes variation. RESULTS: For hierarchically clustered data, we propose considering the strongest result or, equivalently, the smallest p-value as the experiment-wise statistic of interest and evaluating its significance level for a global assessment of statistical significance. We apply our approach to datasets from haplotype association and microarray expression studies where hierarchical clustering has been used. CONCLUSION: In all of the cases we examine, we find that relying on one set of classes in the course of clustering leads to significance levels that are too small when compared with the significance level associated with an overall statistic that incorporates the process of clustering. In other words, relying on one step of clustering may furnish a formally significant result while the overall experiment is not significant

Springer - Publisher Connector

PubMed Central

Partial correlation analysis indicates causal relationships between GC-content, exon density and recombination rate in the human genome

Author: Freudengerb Jan
Li Wentian
Wang Mingyi
Yang Yaning
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

{\bf Background}: Several features are known to correlate with the GC-content in the human genome, including recombination rate, gene density and distance to telomere. However, by testing for pairwise correlation only, it is impossible to distinguish direct associations from indirect ones and to distinguish between causes and effects. {\bf Results}: We use partial correlations to construct partially directed graphs for the following four variables: GC-content, recombination rate, exon density and distance-to-telomere. Recombination rate and exon density are unconditionally uncorrelated, but become inversely correlated by conditioning on GC-content. This pattern indicates a model where recombination rate and exon density are two independent causes of GC-content variation. {\bf Conclusions}: Causal inference and graphical models are useful methods to understand genome evolution and the mechanisms of isochore evolution in the human genome

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Effective Sample Size: Quick Estimation of the Effect of Related Samples in Genetic Case-Control Association Analyses

Author: Chukwuma Ogunwole
Daniel Kastner
Elaine L. Remmers
Peter K. Gregersen
Wentian F. Li
Yaning B. Yang
Publication venue
Publication date: 09/07/2007
Field of study

Correlated samples have been frequently avoided in case-control
genetic association
 studies in part because the methods for handling them are either not
easily implemented or not widely known. We
advocate one method for case-control association analysis of correlated
samples -- the effective sample size method -- as a simple and
accessible approach that does not require specialized computer programs.
The effective sample size method captures the variance inflation
of allele frequency estimation exactly, and can be used to modify the
chi-square test statistic, p-value, and 95% confidence interval of
odds-ratio simply by replacing the apparent number of allele counts with the
effective ones. For genotype frequency estimation, although a single
effective sample size is unable to completely characterize the variance inflation,
an averaged one can satisfactorily approximate the simulated result.
The effective sample size method is applied to the rheumatoid arthritis
siblings data collected from the North American Rheumatoid Arthritis Consortium (NARAC)
to establish a significant association with the interferon-induced
helicasel gene (IFIH1) previously being identified as a type 1 diabetes
susceptibility locus. Connections between the effective sample size
method and other methods, such as generalized estimation equation,
variance of eigenvalues for correlation matrices, and genomic controls,
are also discussed.&#xa

Nature Precedings

Likelihood ratio tests in random graph models with increasing dimensions

Author: Li Yuanzhang
Xu Jinfeng
Yan Ting
Yang Yaning
Zhu Ji
Publication venue
Publication date: 09/11/2023
Field of study

We explore the Wilks phenomena in two random graph models: the

\beta

-model and the Bradley-Terry model. For two increasing dimensional null hypotheses, including a specified null

H_0: \beta_i=\beta_i^0

for

i=1,\ldots, r

and a homogenous null

H_0: \beta_1=\cdots=\beta_r

, we reveal high dimensional Wilks' phenomena that the normalized log-likelihood ratio statistic,

[2\{\ell(\widehat{\mathbf{\beta}}) - \ell(\widehat{\mathbf{\beta}}^0)\} -r]/(2r)^{1/2}

, converges in distribution to the standard normal distribution as

r

goes to infinity. Here,

\ell( \mathbf{\beta})

is the log-likelihood function on the model parameter

\mathbf{\beta}=(\beta_1, \ldots, \beta_n)^\top

\widehat{\mathbf{\beta}}

is its maximum likelihood estimator (MLE) under the full parameter space, and

\widehat{\mathbf{\beta}}^0

is the restricted MLE under the null parameter space. For the homogenous null with a fixed

r

, we establish Wilks-type theorems that

2\{\ell(\widehat{\mathbf{\beta}}) - \ell(\widehat{\mathbf{\beta}}^0)\}

converges in distribution to a chi-square distribution with

r-1

degrees of freedom, as the total number of parameters,

n

, goes to infinity. When testing the fixed dimensional specified null, we find that its asymptotic null distribution is a chi-square distribution in the

\beta

-model. However, unexpectedly, this is not true in the Bradley-Terry model. By developing several novel technical methods for asymptotic expansion, we explore Wilks type results in a principled manner; these principled methods should be applicable to a class of random graph models beyond the

\beta

-model and the Bradley-Terry model. Simulation studies and real network data applications further demonstrate the theoretical results.Comment: This paper supersedes arxiv article arXiv:2211.10055 titled "Wilks' theorems in the

\beta

-model" by T. Yan, Y. Zhang, J. Xu, Y. Yang and J. Zh

arXiv.org e-Print Archive

The spatiotemporal response of soil moisture to precipitation and temperature changes in an arid region, China

Author: Chen Yaning
De Maeyer Philippe
Wang Anqian
Wang Yunqian
Yang Jing
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

Soil moisture plays a crucial role in the hydrological cycle and climate system. The reliable estimation of soil moisture in space and time is important to monitor and even predict hydrological and meteorological disasters. Here we studied the spatiotemporal variations of soil moisture and explored the effects of precipitation and temperature on soil moisture in different land cover types within the Tarim River Basin from 2001 to 2015, based on high-spatial-resolution soil moisture data downscaled from the European Space Agency's (ESA) Climate Change Initiative (CCI) soil moisture data. The results show that the spatial average soil moisture increased slightly from 2001 to 2015, and the soil moisture variation in summer contributed most to regional soil moisture change. For the land cover, the highest soil moisture occurred in the forest and the lowest value was found in bare land, and soil moisture showed significant increasing trends in grassland and bare land during 2001 similar to 2015. Both partial correlation analysis and multiple linear regression analysis demonstrate that in the study area precipitation had positive effects on soil moisture, while temperature had negative effects, and precipitation made greater contributions to soil moisture variations than temperature. The results of this study can be used for decision making for water management and allocation

Multidisciplinary Digital Publishing Institute

Ghent University Academic Bibliography

Directory of Open Access Journals

Carbonaceous material fractions in sediments and their effect on the sorption and persistence of organic pollutants in small urban watersheds

Author: Mahler Barbara J.
Van Metre Peter C.
Werth Charles J.
Yang Yaning
Publication venue: University of Illinois at Urbana-Champaign. Water Resources Center
Publication date: 01/05/2009
Field of study

U.S. Department of the InteriorU.S. Geological SurveyOpe

Illinois Digital Environment for Access to Learning and Scholarship Repository