68 research outputs found

    Explicit bounds for the approximation error in Benford's law

    Get PDF
    Benford's law states that for many random variables X > 0 its leading digit D = D(X) satisfies approximately the equation P(D = d) = log_{10}(1 + 1/d) for d = 1,2,...,9. This phenomenon follows from another, maybe more intuitive fact, applied to Y := log_{10}(X): For many real random variables Y, the remainder U := Y - floor(Y) is approximately uniformly distributed on [0,1). The present paper provides new explicit bounds for the latter approximation in terms of the total variation of the density of Y or some derivative of it. These bounds are an interesting alternative to traditional Fourier methods which yield mostly qualitative results. As a by-product we obtain explicit bounds for the approximation error in Benford's law.Comment: 16 pages, one figur

    The optimal rate for resolving a near-polytomy in a phylogeny

    Get PDF
    The reconstruction of phylogenetic trees from discrete character data typically relies on models that assume the characters evolve under a continuous-time Markov process operating at some overall rate λ. When λ is too high or too low, it becomes difficult to distinguish a short interior edge from a polytomy (the tree that results from collapsing the edge). In this note, we investigate the rate that maximizes the expected log-likelihood ratio (i.e. the Kullback–Leibler separation) between the four-leaf unresolved (star) tree and a four-leaf binary tree with interior edge length ϵ. For a simple two-state model, we show that as ϵ converges to 0 the optimal rate also converges to zero when the four pendant edges have equal length. However, when the four pendant branches have unequal length, two local optima can arise, and it is possible for the globally optimal rate to converge to a non-zero constant as ϵ→0ϵ→0. Moreover, in the setting where the four pendant branches have equal lengths and either (i) we replace the two-state model by an infinite-state model or (ii) we retain the two-state model and replace the Kullback–Leibler separation by Euclidean distance as the maximization goal, then the optimal rate also converges to a non-zero constant

    Analysis of Ratios in Multivariate Morphometry

    Get PDF
    The analysis of ratios of body measurements is deeply ingrained in the taxonomic literature. Whether for plants or animals, certain ratios are commonly indicated in identification keys, diagnoses, and descriptions. They often provide the only means for separation of cryptic species that mostly lack distinguishing qualitative characters. Additionally, they provide an obvious way to study differences in body proportions, as ratios reflect geometric shape differences. However, when it comes to multivariate analysis of body measurements, for instance, with linear discriminant analysis (LDA) or principal component analysis (PCA), interpretation using body ratios is difficult. Both techniques are commonly applied for separating similar taxa or for exploring the structure of variation, respectively, and require standardized raw or log-transformed variables as input. Here, we develop statistical procedures for the analysis of body ratios in a consistent multivariate statistical framework. In particular, we present algorithms adapted to LDA and PCA that allow the interpretation of numerical results in terms of body proportions. We first introduce a method called the "LDA ratio extractor,” which reveals the best ratios for separation of two or more groups with the help of discriminant analysis. We also provide measures for deciding how much of the total differences between individuals or groups of individuals is due to size and how much is due to shape. The second method, a graphical tool called the "PCA ratio spectrum,” aims at the interpretation of principal components in terms of body ratios. Based on a similar idea, the "allometry ratio spectrum” is developed which can be used for studying the allometric behavior of ratios. Because size can be defined in different ways, we discuss several concepts of size. Central to this discussion is Jolicoeur's multivariate generalization of the allometry equation, a concept that was derived only with a heuristic argument. Here we present a statistical derivation of the allometric size vector using the method of least squares. The application of the above methods is extensively demonstrated using published data sets from parasitic wasps and rock crab

    An approximate Markov model for the wright-fisher diffusion and its application to time series data

    Get PDF
    The joint and accurate inference of selection and demography from genetic data is considered a particularly challenging question in population genetics, since both process may lead to very similar patterns of genetic diversity. However, additional information for disentangling these effects may be obtained by observing changes in allele frequencies over multiple time points. Such data is common in experimental evolution studies, as well as in the comparison of ancient and contemporary samples. Leveraging this information, however, has been computationally challenging, particularly when considering multi-locus data sets. To overcome these issues, we introduce a novel, discrete approximation for diffusion processes, termed mean transition time approximation, which preserves the long-term behavior of the underlying continuous diffusion process. We then derive this approximation for the particular case of inferring selection and demography from time series data under the classic Wright- Fisher model and demonstrate that our approximation is well suited to describe allele trajectories through time, even when only a few states are used. We then develop a Bayesian inference approach to jointly infer the population size and locus-specific selection coefficients with high accuracy, and further extend this model to also infer the rates of sequencing errors and mutations. We finally apply our approach to recent experimental data on the evolution of drug resistance in Influenza virus, identifying likely targets of selection and finding evidence for much larger viral population sizes than previously reported

    Parallel adaptations to nectarivory in parrots, key innovations and the diversification of the Loriinae

    Get PDF
    Specialization to nectarivory is associated with radiations within different bird groups, including parrots. One of them, the Australasian lories, were shown to be unexpectedly species rich. Their shift to nectarivory may have created an ecological opportunity promoting species proliferation. Several morphological specializations of the feeding tract to nectarivory have been described for parrots. However, they have never been assessed in a quantitative framework considering phylogenetic nonindependence. Using a phylogenetic comparative approach with broad taxon sampling and 15 continuous characters of the digestive tract, we demonstrate that nectarivorous parrots differ in several traits from the remaining parrots. These trait-changes indicate phenotype–environment correlations and parallel evolution, and may reflect adaptations to feed effectively on nectar. Moreover, the diet shift was associated with significant trait shifts at the base of the radiation of the lories, as shown by an alternative statistical approach. Their diet shift might be considered as an evolutionary key innovation which promoted significant non-adaptive lineage diversification through allopatric partitioning of the same new niche. The lack of increased rates of cladogenesis in other nectarivorous parrots indicates that evolutionary innovations need not be associated one-to-one with diversification events

    Inference of evolutionary jumps in large phylogenies using Lévy processes

    Get PDF
    Although it is now widely accepted that the rate of phenotypic evolution may not necessarily be constant across large phylogenies, the frequency and phylogenetic position of periods of rapid evolution remain unclear. In his highly influential view of evolution, G. G. Simpson supposed that such evolutionary jumps occur when organisms transition into so-called new adaptive zones, for instance after dispersal into a new geographic area, after rapid climatic changes, or following the appearance of an evolutionary novelty. Only recently, large, accurate and well calibrated phylogenies have become available that allow testing this hypothesis directly, yet inferring evolutionary jumps remains computationally very challenging. Here, we develop a computationally highly efficient algorithm to accurately infer the rate and strength of evolutionary jumps as well as their phylogenetic location. Following previous work we model evolutionary jumps as a compound process, but introduce a novel approach to sample jump configurations that does not require matrix inversions and thus naturally scales to large trees. We then make use of this development to infer evolutionary jumps in Anolis lizards and Loriinii parrots where we find strong signal for such jumps at the basis of clades that transitioned into new adaptive zones, just as postulated by Simpson’s hypothesis

    Inferring heterozygosity from ancient and low coverage genomes

    Get PDF
    While genetic diversity can be quantified accurately from high coverage sequencing data, it is often desirable to obtain such estimates from data with low coverage, either to save costs or because of low DNA quality, as is observed for ancient samples. Here, we introduce a method to accurately infer heterozygosity probabilistically from sequences with average coverage Embedded Image of a single individual. The method relaxes the infinite sites assumption of previous methods, does not require a reference sequence, except for the initial alignment of the sequencing data, and takes into account both variable sequencing errors and potential postmortem damage. It is thus also applicable to nonmodel organisms and ancient genomes. Since error rates as reported by sequencing machines are generally distorted and require recalibration, we also introduce a method to accurately infer recalibration parameters in the presence of postmortem damage. This method does not require knowledge about the underlying genome sequence, but instead works with haploid data (e.g., from the X-chromosome from mammalian males) and integrates over the unknown genotypes. Using extensive simulations we show that a few megabasepairs of haploid data are sufficient for accurate recalibration, even at average coverages as low as Embedded Image At similar coverages, our method also produces very accurate estimates of heterozygosity down to Embedded Image within windows of about 1 Mbp. We further illustrate the usefulness of our approach by inferring genome-wide patterns of diversity for several ancient human samples, and we found that 3000–5000-year-old samples showed diversity patterns comparable to those of modern humans. In contrast, two European hunter-gatherer samples exhibited not only considerably lower levels of diversity than modern samples, but also highly distinct distributions of diversity along their genomes. Interestingly, these distributions were also very different between the two samples, supporting earlier conclusions of a highly diverse and structured population in Europe prior to the arrival of farming
    corecore