    On the Critical Behavior of D1-brane Theories

    We study renormalization-group flow patterns in theories arising on D1-branes in various supersymmetry-breaking backgrounds. We argue that the theory of N D1-branes transverse to an orbifold space can be fine-tuned to flow to the corresponding orbifold conformal field theory in the infrared, for particular values of the couplings and theta angles which we determine using the discrete symmetries of the model. By calculating various nonplanar contributions to the scalar potential in the worldvolume theory, we show that fine-tuning is in fact required at finite N, as would be generically expected. We further comment on the presence of singular conformal field theories (such as those whose target space includes a ``throat'' described by an exactly solvable CFT) in the non-supersymmetric context. Throughout the analysis two applications are considered: to gauge theory/gravity duality and to linear sigma model techniques for studying worldsheet string theory.Comment: 23 pages in harvmac big, 8 figure

    Non-Gaussianity in Island Cosmology

    In this paper we fully calculate the non-Gaussianity of primordial curvature perturbation of island universe by using the second order perturbation equation. We find that for the spectral index ns0.96n_s\simeq 0.96, which is favored by current observations, the non-Gaussianity level fNLf_{NL} seen in island will generally lie between 30 \sim 60, which may be tested by the coming observations. In the landscape, the island universe is one of anthropically acceptable cosmological histories. Thus the results obtained in some sense means the coming observations, especially the measurement of non-Gaussianity, will be significant to make clear how our position in the landscape is populated.Comment: 5 pages, 1 eps figure, some discussions added, published versio

    Open string instantons and relative stable morphisms

    We show how topological open string theory amplitudes can be computed by using relative stable morphisms in the algebraic category. We achieve our goal by explicitly working through an example which has been previously considered by Ooguri and Vafa from the point of view of physics. By using the method of virtual localization, we successfully reproduce their results for multiple covers of a holomorphic disc, whose boundary lies in a Lagrangian submanifold of a Calabi-Yau 3-fold, by Riemann surfaces with arbitrary genera and number of boundary components. In particular we show that in the case we consider there are no open string instantons with more than one boundary component ending on the Lagrangian submanifold.Comment: This is the version published by Geometry & Topology Monographs on 22 April 200

    Descartes' rule of signs and the identifiability of population demographic models from genomic variation data

    The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the "folded" SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes' rule of signs for polynomials to the Laplace transform of piecewise continuous functions.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1264 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum

    The sample frequency spectrum (SFS) of DNA sequences from a collection of individuals is a summary statistic which is commonly used for parametric inference in population genetics. Despite the popularity of SFS-based inference methods, currently little is known about the information-theoretic limit on the estimation accuracy as a function of sample size. Here, we show that using the SFS to estimate the size history of a population has a minimax error of at least O(1/logs)O(1/\log s), where ss is the number of independent segregating sites used in the analysis. This rate is exponentially worse than known convergence rates for many classical estimation problems in statistics. Another surprising aspect of our theoretical bound is that it does not depend on the dimension of the SFS, which is related to the number of sampled individuals. This means that, for a fixed number ss of segregating sites considered, using more individuals does not help to reduce the minimax error bound. Our result pertains to populations that have experienced a bottleneck, and we argue that it can be expected to apply to many populations in nature.Comment: 17 pages, 1 figur

    On a Conjecture of Givental

    These brief notes record our puzzles and findings surrounding Givental's recent conjecture which expresses higher genus Gromov-Witten invariants in terms of the genus-0 data. We limit our considerations to the case of a projective line, whose Gromov-Witten invariants are well-known and easy to compute. We make some simple checks supporting his conjecture.Comment: 13 pages, no figures; v.2: new title, minor change

    An asymptotic sampling formula for the coalescent with Recombination

    Ewens sampling formula (ESF) is a one-parameter family of probability distributions with a number of intriguing combinatorial connections. This elegant closed-form formula first arose in biology as the stationary probability distribution of a sample configuration at one locus under the infinite-alleles model of mutation. Since its discovery in the early 1970s, the ESF has been used in various biological applications, and has sparked several interesting mathematical generalizations. In the population genetics community, extending the underlying random-mating model to include recombination has received much attention in the past, but no general closed-form sampling formula is currently known even for the simplest extension, that is, a model with two loci. In this paper, we show that it is possible to obtain useful closed-form results in the case the population-scaled recombination rate ρ\rho is large but not necessarily infinite. Specifically, we consider an asymptotic expansion of the two-locus sampling formula in inverse powers of ρ\rho and obtain closed-form expressions for the first few terms in the expansion. Our asymptotic sampling formula applies to arbitrary sample sizes and configurations.Comment: Published in at http://dx.doi.org/10.1214/09-AAP646 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A novel spectral method for inferring general diploid selection from time series genetic data

    The increased availability of time series genetic variation data from experimental evolution studies and ancient DNA samples has created new opportunities to identify genomic regions under selective pressure and to estimate their associated fitness parameters. However, it is a challenging problem to compute the likelihood of nonneutral models for the population allele frequency dynamics, given the observed temporal DNA data. Here, we develop a novel spectral algorithm to analytically and efficiently integrate over all possible frequency trajectories between consecutive time points. This advance circumvents the limitations of existing methods which require fine-tuning the discretization of the population allele frequency space when numerically approximating requisite integrals. Furthermore, our method is flexible enough to handle general diploid models of selection where the heterozygote and homozygote fitness parameters can take any values, while previous methods focused on only a few restricted models of selection. We demonstrate the utility of our method on simulated data and also apply it to analyze ancient DNA data from genetic loci associated with coat coloration in horses. In contrast to previous studies, our exploration of the full fitness parameter space reveals that a heterozygote advantage form of balancing selection may have been acting on these loci.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS764 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Multi-locus analysis of genomic time series data from experimental evolution.

    Genomic time series data generated by evolve-and-resequence (E&R) experiments offer a powerful window into the mechanisms that drive evolution. However, standard population genetic inference procedures do not account for sampling serially over time, and new methods are needed to make full use of modern experimental evolution data. To address this problem, we develop a Gaussian process approximation to the multi-locus Wright-Fisher process with selection over a time course of tens of generations. The mean and covariance structure of the Gaussian process are obtained by computing the corresponding moments in discrete-time Wright-Fisher models conditioned on the presence of a linked selected site. This enables our method to account for the effects of linkage and selection, both along the genome and across sampled time points, in an approximate but principled manner. We first use simulated data to demonstrate the power of our method to correctly detect, locate and estimate the fitness of a selected allele from among several linked sites. We study how this power changes for different values of selection strength, initial haplotypic diversity, population size, sampling frequency, experimental duration, number of replicates, and sequencing coverage depth. In addition to providing quantitative estimates of selection parameters from experimental evolution data, our model can be used by practitioners to design E&R experiments with requisite power. We also explore how our likelihood-based approach can be used to infer other model parameters, including effective population size and recombination rate. Then, we apply our method to analyze genome-wide data from a real E&R experiment designed to study the adaptation of D. melanogaster to a new laboratory environment with alternating cold and hot temperatures

    Distortion of genealogical properties when the sample is very large

    Full text link
    Study sample sizes in human genetics are growing rapidly, and in due course it will become routine to analyze samples with hundreds of thousands if not millions of individuals. In addition to posing computational challenges, such large sample sizes call for carefully re-examining the theoretical foundation underlying commonly-used analytical tools. Here, we study the accuracy of the coalescent, a central model for studying the ancestry of a sample of individuals. The coalescent arises as a limit of a large class of random mating models and it is an accurate approximation to the original model provided that the population size is sufficiently larger than the sample size. We develop a method for performing exact computation in the discrete-time Wright-Fisher (DTWF) model and compare several key genealogical quantities of interest with the coalescent predictions. For realistic demographic scenarios, we find that there are a significant number of multiple- and simultaneous-merger events under the DTWF model, which are absent in the coalescent by construction. Furthermore, for large sample sizes, there are noticeable differences in the expected number of rare variants between the coalescent and the DTWF model. To balance the tradeoff between accuracy and computational efficiency, we propose a hybrid algorithm that utilizes the DTWF model for the recent past and the coalescent for the more distant past. Our results demonstrate that the hybrid method with only a handful of generations of the DTWF model leads to a frequency spectrum that is quite close to the prediction of the full DTWF model.Comment: 27 pages, 2 tables, 14 figure