4,572 research outputs found

    Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters

    Full text link
    There are two notoriously hard problems in cluster analysis, estimating the number of clusters, and checking whether the population to be clustered is not actually homogeneous. Given a dataset, a clustering method and a cluster validation index, this paper proposes to set up null models that capture structural features of the data that cannot be interpreted as indicating clustering. Artificial datasets are sampled from the null model with parameters estimated from the original dataset. This can be used for testing the null hypothesis of a homogeneous population against a clustering alternative. It can also be used to calibrate the validation index for estimating the number of clusters, by taking into account the expected distribution of the index under the null model for any given number of clusters. The approach is illustrated by three examples, involving various different clustering techniques (partitioning around medoids, hierarchical methods, a Gaussian mixture model), validation indexes (average silhouette width, prediction strength and BIC), and issues such as mixed type data, temporal and spatial autocorrelation

    Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters

    Get PDF
    There are two notoriously hard problems in cluster analysis, estimating the number of clusters, and checking whether the population to be clustered is not actually homogeneous. Given a dataset, a clustering method and a cluster validation index, this paper proposes to set up null models that capture structural features of the data that cannot be interpreted as indicating clustering. Artificial datasets are sampled from the null model with parameters estimated from the original dataset. This can be used for testing the null hypothesis of a homogeneous population against a clustering alternative. It can also be used to calibrate the validation index for estimating the number of clusters, by taking into account the expected distribution of the index under the null model for any given number of clusters. The approach is illustrated by three examples, involving various different clustering techniques (partitioning around medoids, hierarchical methods, a Gaussian mixture model), validation indexes (average silhouette width, prediction strength and BIC), and issues such as mixed-type data, temporal and spatial autocorrelation

    Clustering Genes of Common Evolutionary History.

    Get PDF
    Phylogenetic inference can potentially result in a more accurate tree using data from multiple loci. However, if the loci are incongruent-due to events such as incomplete lineage sorting or horizontal gene transfer-it can be misleading to infer a single tree. To address this, many previous contributions have taken a mechanistic approach, by modeling specific processes. Alternatively, one can cluster loci without assuming how these incongruencies might arise. Such "process-agnostic" approaches typically infer a tree for each locus and cluster these. There are, however, many possible combinations of tree distance and clustering methods; their comparative performance in the context of tree incongruence is largely unknown. Furthermore, because standard model selection criteria such as AIC cannot be applied to problems with a variable number of topologies, the issue of inferring the optimal number of clusters is poorly understood. Here, we perform a large-scale simulation study of phylogenetic distances and clustering methods to infer loci of common evolutionary history. We observe that the best-performing combinations are distances accounting for branch lengths followed by spectral clustering or Ward's method. We also introduce two statistical tests to infer the optimal number of clusters and show that they strongly outperform the silhouette criterion, a general-purpose heuristic. We illustrate the usefulness of the approach by 1) identifying errors in a previous phylogenetic analysis of yeast species and 2) identifying topological incongruence among newly sequenced loci of the globeflower fly genus Chiastocheta We release treeCl, a new program to cluster genes of common evolutionary history (http://git.io/treeCl)

    The radiation of cynodonts and the ground plan of mammalian morphological diversity

    Get PDF
    Cynodont therapsids diversified extensively after the Permo-Triassic mass extinction event, and gave rise to mammals in the Jurassic. We use an enlarged and revised dataset of discrete skeletal characters to build a new phylogeny for all main cynodont clades from the Late Permian to the Early Jurassic, and we analyse models of morphological diversification in the group. Basal taxa and epicynodonts are paraphyletic relative to eucynodonts, and the latter are divided into cynognathians and probainognathians, with tritylodonts and mammals forming sister groups. Disparity analyses reveal a heterogeneous distribution of cynodonts in a morphospace derived from cladistic characters. Pairwise morphological distances are weakly correlated with phylogenetic distances. Comparisons of disparity by groups and through time are non-significant, especially after the data are rarefied. A disparity peak occurs in the Early/Middle Triassic, after which period the mean disparity fluctuates little. Cynognathians were characterized by high evolutionary rates and high diversity early in their history, whereas probainognathian rates were low. Community structure may have been instrumental in imposing different rates on the two clades

    Large-scale ocean connectivity and planktonic body size

    Get PDF
    Villarino, Ernesto ... et al.-- 13 pages, 5 figures, 5 tables, supplementary material https://dx.doi.org/10.1038/s41467-017-02535-8Global patterns of planktonic diversity are mainly determined by the dispersal of propagules with ocean currents. However, the role that abundance and body size play in determining spatial patterns of diversity remains unclear. Here we analyse spatial community structure - β-diversity - for several planktonic and nektonic organisms from prokaryotes to small mesopelagic fishes collected during the Malaspina 2010 Expedition. β-diversity was compared to surface ocean transit times derived from a global circulation model, revealing a significant negative relationship that is stronger than environmental differences. Estimated dispersal scales for different groups show a negative correlation with body size, where less abundant large-bodied communities have significantly shorter dispersal scales and larger species spatial turnover rates than more abundant small-bodied plankton. Our results confirm that the dispersal scale of planktonic and micro-nektonic organisms is determined by local abundance, which scales with body size, ultimately setting global spatial patterns of diversityThis research was funded by the project Malaspina 2010 Circumnavigation Expedition (Consolider-Ingenio 2010, CSD2008-00077) and cofounded by the Basque Government (Department Deputy of Agriculture, Fishing and Food Policy). [...] E.V. was supported by a PhD Scholarship granted by the Iñaki Goenaga−Technology Centres FoundationPeer Reviewe

    Paleodistributions and Comparative Molecular Phylogeography of Leafcutter Ants (Atta spp.) Provide New Insight into the Origins of Amazonian Diversity

    Get PDF
    The evolutionary basis for high species diversity in tropical regions of the world remains unresolved. Much research has focused on the biogeography of speciation in the Amazon Basin, which harbors the greatest diversity of terrestrial life. The leading hypotheses on allopatric diversification of Amazonian taxa are the Pleistocene refugia, marine incursion, and riverine barrier hypotheses. Recent advances in the fields of phylogeography and species-distribution modeling permit a modern re-evaluation of these hypotheses. Our approach combines comparative, molecular phylogeographic analyses using mitochondrial DNA sequence data with paleodistribution modeling of species ranges at the last glacial maximum (LGM) to test these hypotheses for three co-distributed species of leafcutter ants (Atta spp.). The cumulative results of all tests reject every prediction of the riverine barrier hypothesis, but are unable to reject several predictions of the Pleistocene refugia and marine incursion hypotheses. Coalescent dating analyses suggest that population structure formed recently (Pleistocene-Pliocene), but are unable to reject the possibility that Miocene events may be responsible for structuring populations in two of the three species examined. The available data therefore suggest that either marine incursions in the Miocene or climate changes during the Pleistocene—or both—have shaped the population structure of the three species examined. Our results also reconceptualize the traditional Pleistocene refugia hypothesis, and offer a novel framework for future research into the area

    Multivariate evaluation of the effectiveness of treatment efficacy of cypermethrin against sea lice (Lepeophtheirus salmonis) in Atlantic salmon (Salmo salar)

    Get PDF
    Background: The sea louse Lepeophtheirus salmonis is the most important ectoparasite of farmed Atlantic salmon (Salmo salar) in Norwegian aquaculture. Control of sea lice is primarily dependent on the use of delousing chemotherapeutants, which are both expensive and toxic to other wildlife. The method most commonly used for monitoring treatment effectiveness relies on measuring the percentage reduction in the mobile stages of Lepeophtheirus salmonis only. However, this does not account for changes in the other sea lice stages and may result in misleading or incomplete interpretation regarding the effectiveness of treatment. With the aim of improving the evaluation of delousing treatments, we explored multivariate analyses of bath treatments using the topical pyrethroid, cypermethrin, in salmon pens at five Norwegian production sites.Results: Conventional univariate analysis indicated reductions of over 90% in mobile stages at all sites. In contrast, multivariate analyses indicated differing treatment effectiveness between sites (p-value < 0.01) based on changes in the proportion and abundance of the chalimus and PAAM (pre-adult and adult males) stages. Low water temperatures and shortened intervals between sampling after treatment may account for the differences in the composition of chalimus and PAAM stage groups following treatment. Using multivariate analysis, such factors could be separated from those which were attributable to inadequate treatment or chemotherapeutant failure.Conclusions: Multivariate analyses for evaluation of treatment effectiveness against multiple life cycle stages of L. salmonis yield additional information beyond that derivable from univariate methods. This can aid in the identification of causes of apparent treatment failure in salmon aquaculture
    corecore