199 research outputs found

### On the sub-permutations of pattern avoiding permutations

There is a deep connection between permutations and trees. Certain
sub-structures of permutations, called sub-permutations, bijectively map to
sub-trees of binary increasing trees. This opens a powerful tool set to study
enumerative and probabilistic properties of sub-permutations and to investigate
the relationships between 'local' and 'global' features using the concept of
pattern avoidance. First, given a pattern {\mu}, we study how the avoidance of
{\mu} in a permutation {\pi} affects the presence of other patterns in the
sub-permutations of {\pi}. More precisely, considering patterns of length 3, we
solve instances of the following problem: given a class of permutations K and a
pattern {\mu}, we ask for the number of permutations $\pi \in Av_n(\mu)$ whose
sub-permutations in K satisfy certain additional constraints on their size.
Second, we study the probability for a generic pattern to be contained in a
random permutation {\pi} of size n without being present in the
sub-permutations of {\pi} generated by the entry $1 \leq k \leq n$. These
theoretical results can be useful to define efficient randomized pattern-search
procedures based on classical algorithms of pattern-recognition, while the
general problem of pattern-search is NP-complete

### Yule-generated trees constrained by node imbalance

The Yule process generates a class of binary trees which is fundamental to
population genetic models and other applications in evolutionary biology. In
this paper, we introduce a family of sub-classes of ranked trees, called
Omega-trees, which are characterized by imbalance of internal nodes. The degree
of imbalance is defined by an integer 0 <= w. For caterpillars, the extreme
case of unbalanced trees, w = 0. Under models of neutral evolution, for
instance the Yule model, trees with small w are unlikely to occur by chance.
Indeed, imbalance can be a signature of permanent selection pressure, such as
observable in the genealogies of certain pathogens. From a mathematical point
of view it is interesting to observe that the space of Omega-trees maintains
several statistical invariants although it is drastically reduced in size
compared to the space of unconstrained Yule trees. Using generating functions,
we study here some basic combinatorial properties of Omega-trees. We focus on
the distribution of the number of subtrees with two leaves. We show that
expectation and variance of this distribution match those for unconstrained
trees already for very small values of w

### Counting, grafting and evolving binary trees

Binary trees are fundamental objects in models of evolutionary biology and population genetics. Here, we discuss some of their combinatorial and structural properties as they depend on the tree class considered. Furthermore, the process by which trees are generated determines the probability distribution in tree space. Yule trees, for instance, are generated by a pure birth process. When considered as unordered, they have neither a closed-form enumeration nor a simple probability distribution. But their ordered siblings have both. They present the object of choice when studying tree structure in the framework of evolving genealogies

### Processes determining genetic variability: mutations in sequence space and hitchhiking

Departing from the classical model of the so-called error threshold of mutating macro-molecules, I have reformulated the model in the context of diploid organisms evolving in sequence space and under conditions of a finite population size. I found - for instance - that dominance properties have a substantial impact on the details of the error threshold (Chpt. 1). I have then asked whether error thresholds can also be observed in more general fitness landscapes than the original single-peaked landscape. For smooth landscapes the answer is negative (Chpt. 2) Studying diploid organism, I also investigated the impact of recombination on the evolutionary dynamics and on the possibility for a population to reach a fitness maximum. I concluded that the recombination rate, i.e., the chromosomal distance between interacting genetic loci, has a much more important role in generating fitness-conferring allele combinations than manipulating the mutation rate (Chpt. 3). Finally, considering a two-locus model in which one locus experiences beneficial mutations and a second locus is selectively neutral, I investigated the much discussed model of genetic hitchhiking. Using diffusion theory, I predicted the impact on the level of neutral polymorphism imposed by a beneficial mutation on a neighbouring genetic locus (Chpt. 4) and compared the predictions to experimental data of observed genetic variability in the fruitfly Drosophila. This lead to an estimate on the rate and strength with which beneficial substitutions occur in natural populations (Chpt. 5).Ausgehend von dem klassischen Modell der sogenannten Fehlerschwelle mutierender Makromoleküle habe ich das Modell im Kontext von diploiden Organismen, die im Sequenzraum und unter den Bedingungen einer endlichen Populationsgröße evolvieren, neu formuliert. Ich fand zum Beispiel heraus, dass Dominanz-Eigenschaften einen wesentlichen Einfluss auf die Details der Fehlerschwelle haben (Kap. 1). Ich habe dann gefragt, ob sich Fehlerschwellen auch in allgemeineren Fitnesslandschaften, als der ursprünglichen Ein-Peak-Landschaft, zeigen. Für "weiche" Landschaften ist die Antwort negativ (Chpt. 2). An diploiden Organismen habe ich auch den Einfluss der Rekombination auf die evolutionäre Dynamik und auf die Möglichkeit einer Population, ein Fitnessmaximum zu erreichen, untersucht. Ich kam zu dem Schluss, dass die Rekombinationsrate, d.h. der chromosomale Abstand zwischen interagierenden genetischen Loci, eine viel wichtigere Rolle bei der Erzeugung von fitnessfördernden Allelkombinationen spielt, als die Manipulation der Mutationsrate (Kap. 3). Schließlich untersuchte ich in einem Zwei-Locus-Modell, in dem ein Locus vorteilhafte Mutationen erfährt und ein zweiter Locus selektiv neutral ist, das viel diskutierte Modell des "genetischen Hitchhiking". Mit Hilfe von Diffusionstheorie konnte ich die Auswirkung einer vorteilhaften Mutation auf das Niveau der neutralen Variabilität an einem benachbarten genetischen Locus vorhersagen (Kap. 4) und dann diese Ergebnisse mit experimentellen Daten beobachtbarer genetischer Variabilität bei der Fruchtfliege Drosophila vergleichen. Dies führte zu einer Abschätzung der Rate und Stärke, mit der vorteilhafte Substitutionen in natürlichen Populationen auftreten (Kap. 5)

### Exact enumeration of cherries and pitchforks in ranked trees under the coalescent model

We consider exact enumerations and probabilistic properties of ranked trees
when generated under the random coalescent process. Using a new approach, based
on generating functions, we derive several statistics such as the exact
probability of finding k cherries in a ranked tree of fixed size n. We then
extend our method to consider also the number of pitchforks. We find a
recursive formula to calculate the joint and conditional probabilities of
cherries and pitch- forks when the size of the tree is fixed

### Decomposing the site frequency spectrum: the impact of tree topology on neutrality tests

We investigate the dependence of the site frequency spectrum (SFS) on the
topological structure of genealogical trees. We show that basic population
genetic statistics - for instance estimators of $\theta$ or neutrality tests
such as Tajima's $D$ - can be decomposed into components of waiting times
between coalescent events and of tree topology. Our results clarify the
relative impact of the two components on these statistics. We provide a
rigorous interpretation of positive or negative values of an important class of
neutrality tests in terms of the underlying tree shape. In particular, we show
that values of Tajima's $D$ and Fay and Wu's $H$ depend in a direct way on a
peculiar measure of tree balance which is mostly determined by the root balance
of the tree. We present a new test for selection in the same class as Fay and
Wu's $H$ and discuss its interpretation and power. Finally, we determine the
trees corresponding to extreme expected values of these neutrality tests and
present formulae for these extreme values as a function of sample size and
number of segregating sites.Comment: 23 pages, 8 figure

### The expected neutral frequency spectrum of linked sites

We present an exact, closed expression for the expected neutral Site
Frequency Spectrum for two neutral sites, 2-SFS, without recombination. This
spectrum is the immediate extension of the well known single site $\theta/f$
neutral SFS. Similar formulae are also provided for the case of the expected
SFS of sites that are linked to a focal neutral mutation of known frequency.
Formulae for finite samples are obtained by coalescent methods and remarkably
simple expressions are derived for the SFS of a large population, which are
also solutions of the multi-allelic Kolmogorov equations. Besides the general
interest of these new spectra, they relate to interesting biological cases such
as structural variants and introgressions. As an example, we present the
expected neutral frequency spectrum of regions with a chromosomal inversion.Comment: 26 pages, 5 figure

### Genome comparison without alignment using shortest unique substrings

BACKGROUND: Sequence comparison by alignment is a fundamental tool of molecular biology. In this paper we show how a number of sequence comparison tasks, including the detection of unique genomic regions, can be accomplished efficiently without an alignment step. Our procedure for nucleotide sequence comparison is based on shortest unique substrings. These are substrings which occur only once within the sequence or set of sequences analysed and which cannot be further reduced in length without losing the property of uniqueness. Such substrings can be detected using generalized suffix trees. RESULTS: We find that the shortest unique substrings in Caenorhabditis elegans, human and mouse are no longer than 11 bp in the autosomes of these organisms. In mouse and human these unique substrings are significantly clustered in upstream regions of known genes. Moreover, the probability of finding such short unique substrings in the genomes of human or mouse by chance is extremely small. We derive an analytical expression for the null distribution of shortest unique substrings, given the GC-content of the query sequences. Furthermore, we apply our method to rapidly detect unique genomic regions in the genome of Staphylococcus aureus strain MSSA476 compared to four other staphylococcal genomes. CONCLUSION: We combine a method to rapidly search for shortest unique substrings in DNA sequences and a derivation of their null distribution. We show that unique regions in an arbitrary sample of genomes can be efficiently detected with this method. The corresponding programs shustring (SHortest Unique subSTRING) and shulen are written in C and available at

- …