246 research outputs found

    Descartes' rule of signs and the identifiability of population demographic models from genomic variation data

    Full text link
    The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the "folded" SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes' rule of signs for polynomials to the Laplace transform of piecewise continuous functions.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1264 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A model and framework for reliable build systems

    Full text link
    Reliable and fast builds are essential for rapid turnaround during development and testing. Popular existing build systems rely on correct manual specification of build dependencies, which can lead to invalid build outputs and nondeterminism. We outline the challenges of developing reliable build systems and explore the design space for their implementation, with a focus on non-distributed, incremental, parallel build systems. We define a general model for resources accessed by build tasks and show its correspondence to the implementation technique of minimum information libraries, APIs that return no information that the application doesn't plan to use. We also summarize preliminary experimental results from several prototype build managers

    A novel spectral method for inferring general diploid selection from time series genetic data

    Full text link
    The increased availability of time series genetic variation data from experimental evolution studies and ancient DNA samples has created new opportunities to identify genomic regions under selective pressure and to estimate their associated fitness parameters. However, it is a challenging problem to compute the likelihood of nonneutral models for the population allele frequency dynamics, given the observed temporal DNA data. Here, we develop a novel spectral algorithm to analytically and efficiently integrate over all possible frequency trajectories between consecutive time points. This advance circumvents the limitations of existing methods which require fine-tuning the discretization of the population allele frequency space when numerically approximating requisite integrals. Furthermore, our method is flexible enough to handle general diploid models of selection where the heterozygote and homozygote fitness parameters can take any values, while previous methods focused on only a few restricted models of selection. We demonstrate the utility of our method on simulated data and also apply it to analyze ancient DNA data from genetic loci associated with coat coloration in horses. In contrast to previous studies, our exploration of the full fitness parameter space reveals that a heterozygote advantage form of balancing selection may have been acting on these loci.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS764 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Distortion of genealogical properties when the sample is very large

    Full text link
    Study sample sizes in human genetics are growing rapidly, and in due course it will become routine to analyze samples with hundreds of thousands if not millions of individuals. In addition to posing computational challenges, such large sample sizes call for carefully re-examining the theoretical foundation underlying commonly-used analytical tools. Here, we study the accuracy of the coalescent, a central model for studying the ancestry of a sample of individuals. The coalescent arises as a limit of a large class of random mating models and it is an accurate approximation to the original model provided that the population size is sufficiently larger than the sample size. We develop a method for performing exact computation in the discrete-time Wright-Fisher (DTWF) model and compare several key genealogical quantities of interest with the coalescent predictions. For realistic demographic scenarios, we find that there are a significant number of multiple- and simultaneous-merger events under the DTWF model, which are absent in the coalescent by construction. Furthermore, for large sample sizes, there are noticeable differences in the expected number of rare variants between the coalescent and the DTWF model. To balance the tradeoff between accuracy and computational efficiency, we propose a hybrid algorithm that utilizes the DTWF model for the recent past and the coalescent for the more distant past. Our results demonstrate that the hybrid method with only a handful of generations of the DTWF model leads to a frequency spectrum that is quite close to the prediction of the full DTWF model.Comment: 27 pages, 2 tables, 14 figure

    Geometry of the sample frequency spectrum and the perils of demographic inference

    Full text link
    The sample frequency spectrum (SFS), which describes the distribution of mutant alleles in a sample of DNA sequences, is a widely used summary statistic in population genetics. The expected SFS has a strong dependence on the historical population demography and this property is exploited by popular statistical methods to infer complex demographic histories from DNA sequence data. Most, if not all, of these inference methods exhibit pathological behavior, however. Specifically, they often display runaway behavior in optimization, where the inferred population sizes and epoch durations can degenerate to 0 or diverge to infinity, and show undesirable sensitivity of the inferred demography to perturbations in the data. The goal of this paper is to provide theoretical insights into why such problems arise. To this end, we characterize the geometry of the expected SFS for piecewise-constant demographic histories and use our results to show that the aforementioned pathological behavior of popular inference methods is intrinsic to the geometry of the expected SFS. We provide explicit descriptions and visualizations for a toy model with sample size 4, and generalize our intuition to arbitrary sample sizes n using tools from convex and algebraic geometry. We also develop a universal characterization result which shows that the expected SFS of a sample of size n under an arbitrary population history can be recapitulated by a piecewise-constant demography with only k(n) epochs, where k(n) is between n/2 and 2n-1. The set of expected SFS for piecewise-constant demographies with fewer than k(n) epochs is open and non-convex, which causes the above phenomena for inference from data.Comment: 21 pages, 5 figure

    Strength and Stiffness Characterization of Controlled Low-Strength Material Using Native High-Plasticity Clay

    Get PDF
    A research attempt was made to design a controlled low-strength material (CLSM) mix that can be used as bedding and haunch material for a pipeline by using the native soil as fine aggregate. Several CLSM mix designs were attempted using native high-plasticity clay as fine aggregate material. Comprehensive material characterization studies including flowability to strength tests were performed. These results were analyzed to address the applicability of each mix to serve as pipe bedding/backfilling zones in a pipeline construction. Both flowability and density test results are first evaluated, and as a result, several mixes are formulated. These mixes were further subjected to engineering characterization-related studies, and this paper presents these test results. Setting time, strength, and stiffness results as well as excavatability evaluations of these mixtures are covered as a part of these studies. These results indicate that the CLSMs can be produced using native high-plasticity soils with strength properties always matching specified requirements. Certain relaxation on setting time periods could further help in developing economical mix designs. CLSMs that meet project specifications are recommended for field implementation

    Flowability and Density Characteristics of Controlled Low Strength Material (CLSM) Using Native High Plasticity Clay

    Get PDF
    In pipeline construction projects when high plastic clayey soils are encountered in the excavated trench material, they are typically landfilled and better quality materials are imported from outside quarry sources for use as bedding and haunch zone materials. This practice has detrimental environmental and cost impacts; therefore, an efficient reutilization of this high plastic excavated material to produce controlled low strength materials (CLSMs) to use as bedding and haunch zone materials will have major sustainability benefits. As a part of an on-going research study, novel CLSM mix designs were developed by utilizing native high plastic clayey soils from the excavated trench material. Due to the high plasticity nature of the soils, it is essential to address both flowability and density property requirements prior to validating them against other engineering properties. Hence, several CLSM mixtures with the native clayey soils as ingredients were initially designed as per flowability criterion to establish the optimum quantities of chemical binders and water quantities. Later, these mixes were verified for satisfying density property criterion. This technical note presents the step by step procedure followed in preparing these mixes along with test results obtained from various mixes designed as a part of the testing program. Based on these results it was evident that CLSM mixes with high plastic clays can be developed that meet both flowability and density criteria. The success of this research has enhanced the sustainability efforts in pipeline construction projects as this study showed excavated clayey soils can be successfully reused in CLSM applications than landfilling them

    Addressing Clay Mineralogy Effects on Performance of Chemically Stabilized Expansive Soils Subjected to Seasonal Wetting and Drying

    Get PDF
    Premature failures in chemically stabilized expansive soils cause millions of dollars in maintenance and repair costs. One of the reasons for these failures is the inability of existing stabilization design guidelines to consider the complex interactions between clay minerals and the stabilizers. It is vital to understand these complex interactions, as they are responsible for the strength improvement and swell/shrink reduction in these soils, in turn affecting the overall health of the infrastructure. Hence, this research study examined the longevity of chemically stabilized expansive soils subjected to wetting/drying conditions with a major focus on clay mineralogy. Eight different natural soils with varying clay mineralogy were subjected to wetting/drying durability studies after stabilizing with chemical additives including quicklime and cement. Performance indicators such as volumetric strain and Unconfined compressive strength trends were monitored at regular intervals during the wetting/drying process. It was observed that clayey soils dominant in the mineral Montmorillonite were susceptible to premature failures. It was also noted that soils dominant in other clay minerals exhibited early failures at lower additive contents. Also, an attempt was made for the first time to address the field implications of the laboratory studies by developing a correlation that predicts service life in the field based on clay mineralogy and stabilizer dosage

    Swell and Shrinkage Strain Prediction Models for Expansive Clays

    Get PDF
    A comprehensive laboratory investigation was conducted to study volume change behaviors of five different types of expansive clayey soils sampled from various regions in Texas, USA. The laboratory test results, which were presented in an earlier paper, are analyzed here to evaluate existing correlations that can be used to predict swell and shrink-related displacements in these soils. The test database is also used to develop newer and practical models for predicting volume change-related soil properties. Models developed here used soil plasticity and compaction properties as independent variables. Newer models, that rely on seasonal compaction moisture content variations in the subsoils, were introduced to estimate both volumetric and vertical swell and shrinkage-induced soil deformations expected under civil infrastructure. The developed correlations, along with the existing models, were then used to predict vertical soil swell movements of four case studies where swell-induced soil movements were monitored. This comparison analysis showed that the model dependency on the volume change test procedural information and moisture content variation due to seasonal changes will lead to better prediction of swell movements in subsoils. Future research directions and recommendations are provided on implementation of the developed models in a realistic estimation of swell movements of infrastructure construction projects
    • …
    corecore