1,249 research outputs found
Review The species concept for prokaryotes
The species concept is a recurrent controversial issue that preoccupies philosophers as well as biologists of all disciplines. Prokaryotic species concept has its own history and results from a series of empirical improvements parallel to the development of the techniques of analysis. Among the microbial taxonomists, there is general agreement that the species concept currently in use is useful, pragmatic and universally applicable within the prokaryotic world. However, this empirically designed concept is not encompassed by any of the, at least, 22 concepts described for eukaryotes. The species could be described as ‘a monophyletic and genomically coherent cluster of individual organisms that show a high degree of overall similarity in many independent characteristics, and is diagnosable by a discriminative phenotypic property’. We suggest to refer it as a phylo‐phenetic species concept. Here, we discuss the validity of the concept in use which we believe is more pragmatic in comparison with those concepts described for eukaryotes
A rapid and scalable method for multilocus species delimitation using Bayesian model comparison and rooted triplets
Multilocus sequence data provide far greater power to resolve species limits than the single locus data typically used for broad surveys of clades. However, current statistical methods based on a multispecies coalescent framework are computationally demanding, because of the number of possible delimitations that must be compared and time-consuming likelihood calculations. New methods are therefore needed to open up the power of multilocus approaches to larger systematic surveys. Here, we present a rapid and scalable method that introduces two new innovations. First, the method reduces the complexity of likelihood calculations by decomposing the tree into rooted triplets. The distribution of topologies for a triplet across multiple loci has a uniform trinomial distribution when the 3 individuals belong to the same species, but a skewed distribution if they belong to separate species with a form that is specified by the multispecies coalescent. A Bayesian model comparison framework was developed and the best delimitation found by comparing the product of posterior probabilities of all triplets. The second innovation is a new dynamic programming algorithm for finding the optimum delimitation from all those compatible with a guide tree by successively analyzing subtrees defined by each node. This algorithm removes the need for heuristic searches used by current methods, and guarantees that the best solution is found and potentially could be used in other systematic applications. We assessed the performance of the method with simulated, published and newly generated data. Analyses of simulated data demonstrate that the combined method has favourable statistical properties and scalability with increasing sample sizes. Analyses of empirical data from both eukaryotes and prokaryotes demonstrate its potential for delimiting species in real cases
A general and efficient representation of ancestral recombination graphs
As a result of recombination, adjacent nucleotides can have different paths of genetic inheritance and therefore the genealogical trees for a sample of DNA sequences vary along the genome. The structure capturing the details of these intricately interwoven paths of inheritance is referred to as an ancestral recombination graph (ARG). Classical formalisms have focused on mapping coalescence and recombination events to the nodes in an ARG. However, this approach is out of step with some modern developments, which do not represent genetic inheritance in terms of these events or explicitly infer them. We present a simple formalism that defines an ARG in terms of specific genomes and their intervals of genetic inheritance, and show how it generalizes these classical treatments and encompasses the outputs of recent methods. We discuss nuances arising from this more general structure, and argue that it forms an appropriate basis for a software standard in this rapidly growing field.</p
The era of the ARG: an empiricist's guide to ancestral recombination graphs
In the presence of recombination, the evolutionary relationships between a
set of sampled genomes cannot be described by a single genealogical tree.
Instead, the genomes are related by a complex, interwoven collection of
genealogies formalized in a structure called an ancestral recombination graph
(ARG). An ARG extensively encodes the ancestry of the genome(s) and thus is
replete with valuable information for addressing diverse questions in
evolutionary biology. Despite its potential utility, technological and
methodological limitations, along with a lack of approachable literature, have
severely restricted awareness and application of ARGs in empirical evolution
research. Excitingly, recent progress in ARG reconstruction and simulation have
made ARG-based approaches feasible for many questions and systems. In this
review, we provide an accessible introduction and exploration of ARGs, survey
recent methodological breakthroughs, and describe the potential for ARGs to
further existing goals and open avenues of inquiry that were previously
inaccessible in evolutionary genomics. Through this discussion, we aim to more
widely disseminate the promise of ARGs in evolutionary genomics and encourage
the broader development and adoption of ARG-based inference.Comment: 34 pages, 3 figures, 3 table
Efficient ancestry and mutation simulation with msprime 1.0
Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime’s many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement
A single polyploidization event at the origin of the tetraploid genome of Coffea arabica is responsible for the extremely low genetic variation in wild and cultivated germplasm
The genome of the allotetraploid species Coffea arabica L. was sequenced to assemble independently the two component subgenomes (putatively deriving from C. canephora and C. eugenioides) and to perform a genome-wide analysis of the genetic diversity in cultivated coffee germplasm and in wild populations growing in the center of origin of the species. We assembled a total length of 1.536 Gbp, 444 Mb and 527 Mb of which were assigned to the canephora and eugenioides subgenomes, respectively, and predicted 46,562 gene models, 21,254 and 22,888 of which were assigned to the canephora and to the eugeniodes subgenome, respectively. Through a genome-wide SNP genotyping of 736 C. arabica accessions, we analyzed the genetic diversity in the species and its relationship with geographic distribution and historical records. We observed a weak population structure due to low-frequency derived alleles and highly negative values of Taijma's D, suggesting a recent and severe bottleneck, most likely resulting from a single event of polyploidization, not only for the cultivated germplasm but also for the entire species. This conclusion is strongly supported by forward simulations of mutation accumulation. However, PCA revealed a cline of genetic diversity reflecting a west-to-east geographical distribution from the center of origin in East Africa to the Arabian Peninsula. The extremely low levels of variation observed in the species, as a consequence of the polyploidization event, make the exploitation of diversity within the species for breeding purposes less interesting than in most crop species and stress the need for introgression of new variability from the diploid progenitors
Reticulate Evolution: Symbiogenesis, Lateral Gene Transfer, Hybridization and Infectious heredity
info:eu-repo/semantics/publishedVersio
Recommended from our members
Molecular Characterization and Comparative Genomics of Clinical Hybrid Shiga Toxin-Producing and Enterotoxigenic Escherichia coli (STEC/ETEC) Strains in Sweden
Hybrid E. coli pathotypes are representing emerging public health threats with enhanced virulence from different pathotypes. Hybrids of Shiga toxin-producing and enterotoxigenic E. coli (STEC/ETEC) have been reported to be associated with diarrheal disease and hemolytic uremic syndrome (HUS) in humans. Here, we identified and characterized four clinical STEC/ETEC hybrids from diarrheal patients with or without fever or abdominal pain and healthy contact in Sweden. Rare stx2 subtypes were present in STEC/ETEC hybrids. Stx2 production was detectable in stx2a and stx2e containing strains. Different copies of ETEC virulence marker, sta gene, were found in two hybrids. Three sta subtypes, namely, sta1, sta4 and sta5 were designated, with sta4 being predominant. The hybrids represented diverse and rare serotypes (O15:H16, O187:H28, O100:H30, and O136:H12). Genome-wide phylogeny revealed that these hybrids exhibited close relatedness with certain ETEC, STEC/ETEC hybrid and commensal E. coli strains, implying the potential acquisition of Stx-phages or/and ETEC virulence genes in the emergence of STEC/ETEC hybrids. Given the emergence and public health significance of hybrid pathotypes, a broader range of virulence markers should be considered in the E. coli pathotypes diagnostics, and targeted follow up of cases is suggested to better understand the hybrid infection
- …