686 research outputs found

    Minimizing recombinations in consensus networks for phylogeographic studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We address the problem of studying recombinational variations in (human) populations. In this paper, our focus is on one computational aspect of the general task: Given two networks <it>G</it><sub>1 </sub>and <it>G</it><sub>2</sub>, with both mutation and recombination events, defined on overlapping sets of extant units the objective is to compute a consensus network <it>G</it><sub>3 </sub>with minimum number of additional recombinations. We describe a polynomial time algorithm with a guarantee that the number of computed new recombination events is within <it>ϵ </it>= <it>sz</it>(<it>G</it><sub>1</sub>, <it>G</it><sub>2</sub>) (function <it>sz </it>is a well-behaved function of the sizes and topologies of <it>G</it><sub>1 </sub>and <it>G</it><sub>2</sub>) of the optimal <it>number </it>of recombinations. To date, this is the best known result for a network consensus problem.</p> <p>Results</p> <p>Although the network consensus problem can be applied to a variety of domains, here we focus on structure of human populations. With our preliminary analysis on a segment of the human Chromosome X data we are able to infer ancient recombinations, population-specific recombinations and more, which also support the widely accepted 'Out of Africa' model. These results have been verified independently using traditional manual procedures. To the best of our knowledge, this is the first recombinations-based characterization of human populations.</p> <p>Conclusion</p> <p>We show that our mathematical model identifies recombination spots in the individual haplotypes; the aggregate of these spots over a set of haplotypes defines a recombinational landscape that has enough signal to detect continental as well as population divide based on a short segment of Chromosome X. In particular, we are able to infer ancient recombinations, population-specific recombinations and more, which also support the widely accepted 'Out of Africa' model. The agreement with mutation-based analysis can be viewed as an indirect validation of our results and the model. Since the model in principle gives us more information embedded in the networks, in our future work, we plan to investigate more non-traditional questions via these structures computed by our methodology.</p

    Transkingdom Networks: A Systems Biology Approach to Identify Causal Members of Host-Microbiota Interactions

    Full text link
    Improvements in sequencing technologies and reduced experimental costs have resulted in a vast number of studies generating high-throughput data. Although the number of methods to analyze these "omics" data has also increased, computational complexity and lack of documentation hinder researchers from analyzing their high-throughput data to its true potential. In this chapter we detail our data-driven, transkingdom network (TransNet) analysis protocol to integrate and interrogate multi-omics data. This systems biology approach has allowed us to successfully identify important causal relationships between different taxonomic kingdoms (e.g. mammals and microbes) using diverse types of data

    Turnip mosaic potyvirus probably first spread to Eurasian brassica crops from wild orchids about 1000 years ago

    Get PDF
    Turnip mosaic potyvirus (TuMV) is probably the most widespread and damaging virus that infects cultivated brassicas worldwide. Previous work has indicated that the virus originated in western Eurasia, with all of its closest relatives being viruses of monocotyledonous plants. Here we report that we have identified a sister lineage of TuMV-like potyviruses (TuMV-OM) from European orchids. The isolates of TuMV-OM form a monophyletic sister lineage to the brassica-infecting TuMVs (TuMV-BIs), and are nested within a clade of monocotyledon-infecting viruses. Extensive host-range tests showed that all of the TuMV-OMs are biologically similar to, but distinct from, TuMV-BIs and do not readily infect brassicas. We conclude that it is more likely that TuMV evolved from a TuMV-OM-like ancestor than the reverse. We did Bayesian coalescent analyses using a combination of novel and published sequence data from four TuMV genes [helper component-proteinase protein (HC-Pro), protein 3(P3), nuclear inclusion b protein (NIb), and coat protein (CP)]. Three genes (HC-Pro, P3, and NIb), but not the CP gene, gave results indicating that the TuMV-BI viruses diverged from TuMV-OMs around 1000 years ago. Only 150 years later, the four lineages of the present global population of TuMV-BIs diverged from one another. These dates are congruent with historical records of the spread of agriculture in Western Europe. From about 1200 years ago, there was a warming of the climate, and agriculture and the human population of the region greatly increased. Farming replaced woodlands, fostering viruses and aphid vectors that could invade the crops, which included several brassica cultivars and weeds. Later, starting 500 years ago, inter-continental maritime trade probably spread the TuMV-BIs to the remainder of the world

    Fusion of the subunits α and β of succinyl-CoA synthetase as a phylogenetic marker for Pezizomycotina fungi

    Get PDF
    Gene fusions, yielding the formation of multidomain proteins, are evolutionary events that can be utilized as phylogenetic markers. Here we describe a fusion gene comprising the α and β subunits of succinyl-coA synthetase, an enzyme of the TCA cycle, in Pezizomycotina fungi. This fusion is present in all Pezizomycotina with complete genome sequences and absent from all other organisms. Phylogenetic analysis of the α and β subunits of succinyl-CoA synthetase suggests that both subunits were duplicated and retained in Pezizomycotina while one copy was lost from other fungi. One of the duplicated copies was then fused in Pezizomycotina. Our results suggest that the fusion of the α and β subunits of succinyl-CoA synthetase can be used as a molecular marker for membership in the Pezizomycotina subphylum. If a species has the fusion it can be reliably classified as Pezizomycotina, while the absence of the fusion is suggestive that the species is not a member of this subphylum

    Inference of population splits and mixtures from genome-wide allele frequency data

    Full text link
    Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In this model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication, and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and "ancient" Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.comComment: 28 pages, 6 figures in main text. Attached supplement is 22 pages, 15 figures. This is an updated version of the preprint available at http://precedings.nature.com/documents/6956/version/

    The genome sequence of <i>Trypanosoma brucei gambiense</i>, causative agent of chronic Human African Trypanosomiasis

    Get PDF
    &lt;p&gt;&lt;b&gt;Background:&lt;/b&gt; &lt;i&gt;Trypanosoma brucei gambiense&lt;/i&gt; is the causative agent of chronic Human African Trypanosomiasis or sleeping sickness, a disease endemic across often poor and rural areas of Western and Central Africa. We have previously published the genome sequence of a &lt;i&gt;T. b. brucei&lt;/i&gt; isolate, and have now employed a comparative genomics approach to understand the scale of genomic variation between &lt;i&gt;T. b. gambiense&lt;/i&gt; and the reference genome. We sought to identify features that were uniquely associated with &lt;i&gt;T. b. gambiense&lt;/i&gt; and its ability to infect humans.&lt;/p&gt; &lt;p&gt;&lt;b&gt;Methods and findings:&lt;/b&gt; An improved high-quality draft genome sequence for the group 1 &lt;i&gt;T. b. gambiense&lt;/i&gt; DAL 972 isolate was produced using a whole-genome shotgun strategy. Comparison with &lt;i&gt;T. b. brucei&lt;/i&gt; showed that sequence identity averages 99.2% in coding regions, and gene order is largely collinear. However, variation associated with segmental duplications and tandem gene arrays suggests some reduction of functional repertoire in &lt;i&gt;T. b. gambiense&lt;/i&gt; DAL 972. A comparison of the variant surface glycoproteins (VSG) in &lt;i&gt;T. b. brucei&lt;/i&gt; with all &lt;i&gt;T. b. gambiense&lt;/i&gt; sequence reads showed that the essential structural repertoire of VSG domains is conserved across &lt;i&gt;T. brucei&lt;/i&gt;.&lt;/p&gt; &lt;p&gt;&lt;b&gt;Conclusions:&lt;/b&gt; This study provides the first estimate of intraspecific genomic variation within &lt;i&gt;T. brucei&lt;/i&gt;, and so has important consequences for future population genomics studies. We have shown that the &lt;i&gt;T. b. gambiense&lt;/i&gt; genome corresponds closely with the reference, which should therefore be an effective scaffold for any &lt;i&gt;T. brucei&lt;/i&gt; genome sequence data. As VSG repertoire is also well conserved, it may be feasible to describe the total diversity of variant antigens. While we describe several as yet uncharacterized gene families with predicted cell surface roles that were expanded in number in &lt;i&gt;T. b. brucei&lt;/i&gt;, no &lt;i&gt;T. b. gambiense&lt;/i&gt;-specific gene was identified outside of the subtelomeres that could explain the ability to infect humans.&lt;/p&gt

    Phylogenomics: Gene Duplication, Unrecognized Paralogy and Outgroup Choice

    Get PDF
    Comparative genomics has revealed the ubiquity of gene and genome duplication and subsequent gene loss. In the case of gene duplication and subsequent loss, gene trees can differ from species trees, thus frequent gene duplication poses a challenge for reconstruction of species relationships. Here I address the case of multi-gene sets of putative orthologs that include some unrecognized paralogs due to ancestral gene duplication, and ask how outgroups should best be chosen to reduce the degree of non-species tree (NST) signal. Consideration of expected internal branch lengths supports several conclusions: (i) when a single outgroup is used, the degree of NST signal arising from gene duplication is either independent of outgroup choice, or is minimized by use of a maximally closely related post-duplication (MCRPD) outgroup; (ii) when two outgroups are used, NST signal is minimized by using one MCRPD outgroup, while the position of the second outgroup is of lesser importance; and (iii) when two outgroups are used, the ability to detect gene trees that are inconsistent with known aspects of the species tree is maximized by use of one MCRPD, and is either independent of the position of the second outgroup, or is maximized for a more distantly related second outgroup. Overall, these results generalize the utility of closely-related outgroups for phylogenetic analysis

    Extensive Copy-Number Variation of Young Genes across Stickleback Populations

    Get PDF
    MM received funding from the Max Planck innovation funds for this project. PGDF was supported by a Marie Curie European Reintegration Grant (proposal nr 270891). CE was supported by German Science Foundation grants (DFG, EI 841/4-1 and EI 841/6-1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript
    • …
    corecore