430 research outputs found

    Iterative pruning PCA improves resolution of highly structured populations

    Get PDF
    BACKGROUND: Non-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly estimate the number of subpopulations and assign individuals to them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods are thus becoming increasingly relied upon for population structure analysis. Current PCA-based methods can accurately detect structure; however, the accuracy in resolving subpopulations and assigning individuals to them is wanting. When subpopulations are closely related to one another, they overlap in PCA space and appear as a conglomerate. This problem is exacerbated when some subpopulations in the dataset are genetically far removed from others. We propose a novel PCA-based framework which addresses this shortcoming. RESULTS: A novel population structure analysis algorithm called iterative pruning PCA (ipPCA) was developed which assigns individuals to subpopulations and infers the total number of subpopulations present. Genotypic data from simulated and real population datasets with different degrees of structure were analyzed. For datasets with simple structures, the subpopulation assignments of individuals made by ipPCA were largely consistent with the STRUCTURE, BAPS and AWclust algorithms. On the other hand, highly structured populations containing many closely related subpopulations could be accurately resolved only by ipPCA, and not by other methods. CONCLUSION: The algorithm is computationally efficient and not constrained by the dataset complexity. This systematic subpopulation assignment approach removes the need for prior population labels, which could be advantageous when cryptic stratification is encountered in datasets containing individuals otherwise assumed to belong to a homogenous population

    Genetic analysis of Thai cattle reveals a Southeast Asian indicine ancestry

    Get PDF
    Cattle commonly raised in Thailand have characteristics of [i]Bos indicus[/i] (zebu). We do not know when or how cattle domestication in Thailand occurred, and so questions remain regarding their origins and relationships to other breeds. We obtained genome-wide SNP genotypic data of 28 bovine individuals sampled from four regions: North (Kho-Khaolampoon), Northeast (Kho-Isaan), Central (Kho-Lan) and South (Kho-Chon) Thailand. These regional varieties have distinctive traits suggestive of breed-like genetic variations. From these data, we confirmed that all four Thai varieties are [i]Bos indicus[/i] and that they are distinct from other indicine breeds. Among these Thai cattle, a distinctive ancestry pattern is apparent, which is the purest within Kho-Chon individuals. This ancestral component is only present outside of Thailand among other indicine breeds in Southeast Asia. From this pattern, we conclude that a unique [i]Bos indicus[/i] ancestor originated in Southeast Asia, and native Kho-Chon Thai cattle retain the signal of this ancestry with limited admixture of other bovine ancestors

    Taxonomic evidence applying algorithms of intelligent data mining : Asteroids families

    Get PDF
    Numerical Taxonomy aims to group in clusters, using so-called structure analysis of operational taxonomic units (OTUs or taxons or taxa) through numerical methods. Clusters that consitute families was the purpose of this series of last projects. Structural analysis, based on their phenotypic characteristics, exhibits the relationships, in terms of degrees of similarity, between two or more OTUs. Entities formed by dynamic domains of attributes, change according to taxonomical requirements: Classification of objects to form families. Taxonomic objects are represented by semantics application of Dynamic Relational Database Model. Families of OTUs are obtained employing as tools i) the Euclidean distance and ii) nearest neighbor techniques. Thus taxonomic evidence is gathered so as to quantify the similarity for each pair of OTUs (pair-group method) obtained from the basic data matrix. The main contribution up until now is to introduce the concept of spectrum of the OTUs, based in the states of their characters. The concept of families’ spectra emerges, if the superposition principle is applied to the spectra of the OTUs, and the groups are delimited through the maximum of the Bienaymé-Tchebycheff relation, that determines Invariants (centroid, variance and radius). A new taxonomic criterion is thereby formulated. An astronomic application is worked out. The result is a new criterion for the classification of asteroids in the hyperspace of orbital proper elements. Thus, a new approach to Computational Taxonomy is presented, that has been already employed with reference to Data Mining. This paper analyses the application of Machine Learning techniques to Data Mining. We focused our interest on the TDIDT (Top Down Induction Trees) induction family from pre-classified data, and in particular to the ID3 and the C4.5 algorithms, created by Quinlan. We tried to determine the degree of efficiency achieved by the TDIDT family’s algorithms when applied in data mining to generate valid models of the data in classification problems with the Gain of Entropy. The Informatics (Data Mining and Computational Taxonomy), is always the original objective of our researches.Eje: Bases de datosRed de Universidades con Carreras en Informática (RedUNCI

    High amino acid diversity and positive selection at a putative coral immunity gene (tachylectin-2)

    Get PDF
    BACKGROUND: Genes involved in immune functions, including pathogen recognition and the activation of innate defense pathways, are among the most genetically variable known, and the proteins that they encode are often characterized by high rates of amino acid substitutions, a hallmark of positive selection. The high levels of variation characteristic of immunity genes make them useful tools for conservation genetics. To date, highly variable immunity genes have yet to be found in corals, keystone organisms of the world's most diverse marine ecosystem, the coral reef. Here, we examine variation in and selection on a putative innate immunity gene from Oculina, a coral genus previously used as a model for studies of coral disease and bleaching. RESULTS: In a survey of 244 Oculina alleles, we find high nonsynonymous variation and a signature of positive selection, consistent with a putative role in immunity. Using computational protein structure prediction, we generate a structural model of the Oculina protein that closely matches the known structure of tachylectin-2 from the Japanese horseshoe crab (Tachypleus tridentatus), a protein with demonstrated function in microbial recognition and agglutination. We also demonstrate that at least three other genera of anthozoan cnidarians (Acropora, Montastrea and Nematostella) possess proteins structurally similar to tachylectin-2. CONCLUSIONS: Taken together, the evidence of high amino acid diversity, positive selection and structural correspondence to the horseshoe crab tachylectin-2 suggests that this protein is 1) part of Oculina's innate immunity repertoire, and 2) evolving adaptively, possibly under selective pressure from coral-associated microorganisms. Tachylectin-2 may serve as a candidate locus to screen coral populations for their capacity to respond adaptively to future environmental change

    Draft genome sequence of Xanthomonas fragariae reveals reductive evolution and distinct virulence-related gene content

    Get PDF
    Background: Xanthomonas fragariae (Xf) is a bacterial strawberry pathogen and an A2 quarantine organism on strawberry planting stock in the EU. It is taxonomically and metabolically distinct within the genus Xanthomonas, and known for its host specificity. As part of a broader pathogenicity study, the genome of a Belgian, virulent Xf strain (LMG 25863) was assembled to draft status and examined for its pathogenicity related gene content. Results: The Xf draft genome (4.2 Mb) was considerably smaller than most known Xanthomonas genomes (similar to 5 Mb). Only half of the genes coding for TonB-dependent transporters and cell-wall degrading enzymes that are typically present in other Xanthomonas genomes, were found in Xf. Other missing genes/regions with a possible impact on its plant-host interaction were: i) the three loci for xylan degradation and metabolism, ii) a locus coding for a beta-ketoadipate phenolics catabolism pathway, iii) xcs, one of two Type II Secretion System coding regions in Xanthomonas, and iv) the genes coding for the glyoxylate shunt pathway. Conversely, the Xf genome revealed a high content of externally derived DNA and several uncommon, possibly virulence-related features: a Type VI Secretion System, a second Type IV Secretion System and a distinct Type III Secretion System effector repertoire comprised of multiple rare effectors and several putative new ones. Conclusions: The draft genome sequence of LMG 25863 confirms the distinct phylogenetic position of Xf within the genus Xanthomonas and reveals a patchwork of both lost and newly acquired genomic features. These features may help explain the specific, mostly endophytic association of Xf with the strawberry plant

    A Morphological and Genetic Study of Taxonomy and Evolutionary Divergence in Xanthisma Gracile and Xanthisma Spinulosum

    Get PDF
    Discerning the basis of phenotypic and genotypic differences within and between taxa is crucial for understanding the evolution of species, subspecies or varieties and races. In this dissertation, I have presented three studies, which use morphological characters and genetic Amplified Fragment Length Polymorphisms (AFLPs) to differentiate cytotypes, populations and species of the genus Xanthisma. The first study is aimed at clarifying the species status of Haplopappus ravenii, which has been considered to be a separate species by some taxonomists and a race of Xanthisma gracile by other researchers. Considering the morphological species concept and the genotypic cluster definition of a species, there was insufficient distinction in either dataset to support these taxa as distinct species. It was found that H. ravenii is more appropriately classified as a a cytotype or a race of X. gracile. In the second study, the genetic structure of X. gracile was quantified across populations occupying distinct habitat types (desert, grasslands, and pinyon juniper woodlands) in order to test the hypothesis of local adaptation and to determine the potential for intraspecific divergence. Samples from desert habitats showed higher genetic divergence than samples in the other two habitats. This study is indicative of local adaptation of populations and that changes in climate and habitat play a very important role in the genetic differentiation of plant systems. The third study evaluated the taxonomy of Xanthisma spinulosum and three of its subspecies that co-occur in Arizona. Herbarium specimens representative of the three subspecies were used to test for significant morphological and genetic divergence that would support their recognition. The morphological characters originally utilized by taxonomists who named these taxa were not significantly different among the three taxa. This finding was further supported by the molecular data, suggesting the presence of one contiguous species. This dissertation aims at stressing the importance of taxonomic status and understanding the role that environment can play on shaping differentiation between taxa

    Structural Basis for Broad Neutralization of Hepatitis C Virus Quasispecies

    Get PDF
    Monoclonal antibodies directed against hepatitis C virus (HCV) E2 protein can neutralize cell-cultured HCV and pseudoparticles expressing envelopes derived from multiple HCV subtypes. For example, based on antibody blocking experiments and alanine scanning mutagenesis, it was proposed that the AR3B monoclonal antibody recognized a discontinuous conformational epitope comprised of amino acid residues 396–424, 436–447, and 523–540 of HCV E2 envelope protein. Intriguingly, one of these segments (436–447) overlapped with hypervariable region 3 (HVR3), a domain that exhibited significant intrahost and interhost genetic diversity. To reconcile these observations, amino-acid sequence variability was examined and homology-based structural modelling of E2 based on tick-borne encephalitis virus (TBEV) E protein was performed based on 413 HCV sequences derived from 18 subjects with chronic hepatitis C. Here we report that despite a high degree of amino-acid sequence variability, the three-dimensional structure of E2 is remarkably conserved, suggesting broad recognition of structural determinants rather than specific residues. Regions 396–424 and 523–540 were largely exposed and in close spatial proximity at the surface of E2. In contrast, region 436–447, which overlaps with HVR3, was >35 Å away, and estimates of buried surface were inconsistent with HVR3 being part of the AR3B binding interface. High-throughput structural analysis of HCV quasispecies could facilitate the development of novel vaccines that target conserved structural features of HCV envelope and elicit neutralizing antibody responses that are less vulnerable to viral escape

    Cryptic Hybridization in the Temperate Bamboos: Is Pleioblastus simonii a Species of Hybrid Origin?

    Get PDF
    Japanese river bamboo (Pleioblastus simonii, ‘medake,’‘kawadake’) is an ecologically important species of temperate bamboo native to Japan. This species is widely known and historically important in Japanese rural farm life. Based on morphological data, Japanese river bamboo is classified in Pleioblastus section Medakea (Poaceae: Bambusoideae) along with five other Japanese species, which are collectively considered to represent a phylogenetically distinct lineage. However, recent studies suggest that Japanese river bamboo may have arisen as a result of previously undetected hybridization (i.e., cryptic hybridization), while also calling into question the diversity of section Medakea. The role of hybridization in natural plant populations has been studied since the 1950s; however, little is known about this phenomenon in the evolution of bamboos. Species of Pleioblastus share an issue common to bamboo taxonomy in that they exhibit overlapping variation in leaf and stem characteristics, making them difficult to identify based on morphology alone. One potential factor contributing to, and exacerbating, this issue is cryptic hybridization. The objective of this study was to analyze molecular data, including amplified fragment length polymorphism (AFLP) and nuclear DNA (nDNA) sequence data, to test the hypothesis that P. simonii is a species of hybrid origin. The results provide compelling evidence in support of this hypothesis, while also suggesting that ongoing diversification has obscured bamboo ancestry. Moreover, these findings highlight the importance of using up-to-date analytical v techniques from population genetics and phylogenetics to shed light on how to navigate the complexities of bamboo taxonomy. This study provides an example of reticulate evolution in the origin of plant diversity and helps to reveal why molecular data are important tools for plant taxonomy and systematics
    corecore