7,421 research outputs found

    Clustering and variable selection for categorical multivariate data

    Get PDF
    This article investigates unsupervised classification techniques for categorical multivariate data. The study employs multivariate multinomial mixture modeling, which is a type of model particularly applicable to multilocus genotypic data. A model selection procedure is used to simultaneously select the number of components and the relevant variables. A non-asymptotic oracle inequality is obtained, leading to the proposal of a new penalized maximum likelihood criterion. The selected model proves to be asymptotically consistent under weak assumptions on the true probability underlying the observations. The main theoretical result obtained in this study suggests a penalty function defined to within a multiplicative parameter. In practice, the data-driven calibration of the penalty function is made possible by slope heuristics. Based on simulated data, this procedure is found to improve the performance of the selection procedure with respect to classical criteria such as BIC and AIC. The new criterion provides an answer to the question "Which criterion for which sample size?" Examples of real dataset applications are also provided

    Multiple locus VNTR analysis highlights that geographical clustering and distribution of Dichelobacter nodosus, the causal agent of footrot in sheep, correlates with inter-country movements

    Get PDF
    Dichelobacter nodosus is a Gram-negative, anaerobic bacterium and the causal agent of footrot in sheep. Multiple locus variable number tandem repeat (VNTR) analysis (MLVA) is a portable technique that involves the identification and enumeration of polymorphic tandem repeats across the genome. The aims of this study were to develop an MLVA scheme for D. nodosus suitable for use as a molecular typing tool, and to apply it to a global collection of isolates. Seventy-seven isolates selected from regions with a long history of footrot (GB, Australia) and regions where footrot has recently been reported (India, Scandinavia), were characterised. From an initial 61 potential VNTR regions, four loci were identified as usable and in combination had the attributes required of a typing method for use in bacterial epidemiology: high discriminatory power (D > 0.95), typeability and reproducibility. Results from the analysis indicate that D. nodosus appears to have evolved via recombinational exchanges and clonal diversification. This has resulted in some clonal complexes that contain isolates from multiple countries and continents; and others that contain isolates from a single geographic location (country or region). The distribution of alleles between countries matches historical accounts of sheep movements, suggesting that the MLVA technique is sufficiently specific and sensitive for an epidemiological investigation of the global distribution of D. nodosus

    Migration-selection balance at multiple loci and selection on dominance and recombination

    Get PDF
    A steady influx of a single deleterious multilocus genotype will impose genetic load on the resident population and leave multiple descendants carrying various numbers of the foreign alleles. Provided that the foreign types are rare at equilibrium, and that all immigrant genes will eventually be eliminated by selection, the population structure can be inferred explicitly from the deterministic branching process taking place within a single immigrant lineage. Unless the migration and recombination rates were high, this simple method was a very close approximation to the simulated migration-selection balance with all possible multilocus genotypes considered.Comment: includes 6 figures and a Supporting Information. Mathematica notebook where the numerical results were obtained is available upon reques

    An expanded multilocus sequence typing scheme for propionibacterium acnes : investigation of 'pathogenic', 'commensal' and antibiotic resistant strains

    Get PDF
    The Gram-positive bacterium Propionibacterium acnes is a member of the normal human skin microbiota and is associated with various infections and clinical conditions. There is tentative evidence to suggest that certain lineages may be associated with disease and others with health. We recently described a multilocus sequence typing scheme (MLST) for P. acnes based on seven housekeeping genes (http://pubmlst.org/pacnes). We now describe an expanded eight gene version based on six housekeeping genes and two ‘putative virulence’ genes (eMLST) that provides improved high resolution typing (91eSTs from 285 isolates), and generates phylogenies congruent with those based on whole genome analysis. When compared with the nine gene MLST scheme developed at the University of Bath, UK, and utilised by researchers at Aarhus University, Denmark, the eMLST method offers greater resolution. Using the scheme, we examined 208 isolates from disparate clinical sources, and 77 isolates from healthy skin. Acne was predominately associated with type IA1 clonal complexes CC1, CC3 and CC4; with eST1 and eST3 lineages being highly represented. In contrast, type IA2 strains were recovered at a rate similar to type IB and II organisms. Ophthalmic infections were predominately associated with type IA1 and IA2 strains, while type IB and II were more frequently recovered from soft tissue and retrieved medical devices. Strains with rRNA mutations conferring resistance to antibiotics used in acne treatment were dominated by eST3, with some evidence for intercontinental spread. In contrast, despite its high association with acne, only a small number of resistant CC1 eSTs were identified. A number of eSTs were only recovered from healthy skin, particularly eSTs representing CC72 (type II) and CC77 (type III). Collectively our data lends support to the view that pathogenic versus truly commensal lineages of P. acnes may exist. This is likely to have important therapeutic and diagnostic implications

    Ecological host fitting of Trypanosoma cruzi TcI in Bolivia: mosaic population structure, hybridization and a role for humans in Andean parasite dispersal.

    Get PDF
    An improved understanding of how a parasite species exploits its genetic repertoire to colonize novel hosts and environmental niches is crucial to establish the epidemiological risk associated with emergent pathogenic genotypes. Trypanosoma cruzi, a genetically heterogeneous, multi-host zoonosis, provides an ideal system to examine the sylvatic diversification of parasitic protozoa. In Bolivia, T. cruzi I, the oldest and most widespread genetic lineage, is pervasive across a range of ecological clines. High-resolution nuclear (26 loci) and mitochondrial (10 loci) genotyping of 199 contemporaneous sylvatic TcI clones was undertaken to provide insights into the biogeographical basis of T. cruzi evolution. Three distinct sylvatic parasite transmission cycles were identified: one highland population among terrestrial rodent and triatomine species, composed of genetically homogenous strains (Ar = 2.95; PA/L = 0.61; DAS = 0.151), and two highly diverse, parasite assemblages circulating among predominantly arboreal mammals and vectors in the lowlands (Ar = 3.40 and 3.93; PA/L = 1.12 and 0.60; DAS = 0.425 and 0.311, respectively). Very limited gene flow between neighbouring terrestrial highland and arboreal lowland areas (distance ~220 km; FST = 0.42 and 0.35) but strong connectivity between ecologically similar but geographically disparate terrestrial highland ecotopes (distance >465 km; FST = 0.016-0.084) strongly supports ecological host fitting as the predominant mechanism of parasite diversification. Dissimilar heterozygosity estimates (excess in highlands, deficit in lowlands) and mitochondrial introgression among lowland strains may indicate fundamental differences in mating strategies between populations. Finally, accelerated parasite dissemination between densely populated, highland areas, compared to uninhabited lowland foci, likely reflects passive, long-range anthroponotic dispersal. The impact of humans on the risk of epizootic Chagas disease transmission in Bolivia is discussed

    Multiple paternity and hybridization in two smooth-hound sharks

    Get PDF
    Multiple paternity appears to be a common trait of elasmobranch mating systems, with its occurrence likely driven by convenience, due to females seeking to minimize the stress of male harassment. Here we use molecular markers to analyse the frequency of multiple paternity in two related viviparous sharks, Mustelus mustelus and Mustelus punctulatus. We first applied molecular methods to assign pregnant females, embryos and additional reference adults (N\u2009=\u2009792) to one of the two species. Paternity analysis was performed using a total of 9 polymorphic microsatellites on 19 females and 204 embryos of M. mustelus, and on 13 females and 303 embryos of M. punctulatus. Multiple paternity occurs in both species, with 47% of M. mustelus and 54% of M. punctulatus litters sired by at least two fathers. Female fecundity is not influenced by multiple mating and in 56% of polyandrous litters paternity is skewed, with one male siring most of the pups. Genetic analyses also revealed hybridization between the two species, with a M. punctulatus female bearing pups sired by a M. mustelus male. The frequency of polyandrous litters in these species is consistent with aspects of their reproductive biology, such as synchronous ovulation and possible occurrence of breeding aggregations
    corecore