4,139 research outputs found

    Community Detection and Classification Guarantees Using Embeddings Learned by Node2Vec

    Full text link
    Embedding the nodes of a large network into an Euclidean space is a common objective in modern machine learning, with a variety of tools available. These embeddings can then be used as features for tasks such as community detection/node clustering or link prediction, where they achieve state of the art performance. With the exception of spectral clustering methods, there is little theoretical understanding for other commonly used approaches to learning embeddings. In this work we examine the theoretical properties of the embeddings learned by node2vec. Our main result shows that the use of k-means clustering on the embedding vectors produced by node2vec gives weakly consistent community recovery for the nodes in (degree corrected) stochastic block models. We also discuss the use of these embeddings for node and link prediction tasks. We demonstrate this result empirically, and examine how this relates to other embedding tools for network data

    Kinisi:Bayesian analysis of mass transport from molecular dynamics simulations

    Get PDF
    kinisi is a Python package for estimating transport coefficients—e.g., self-diffusion coefficients, ∗—and their corresponding uncertainties from molecular dynamics simulation data. It includes an implementation of the approximate Bayesian regression scheme described in McCluskey etal. (2023), wherein the mean-squared displacement (MSD) of mobile atoms is modelled as a multivariate normal distribution that is parametrised from the input simulation data. kinisi uses Markov-chain Monte Carlo (Foreman-Mackey et al., 2019; Goodman &amp; Weare, 2010) to sample this model multivariate normal distribution to give a posterior distribution of linear model ensemble MSDs that are compatible with the observed simulation data. For each linear ensemble MSD, x(), a corresponding estimate of the diffusion coefficient, ̂∗ is given via the Einstein relation, ̂∗ =1d x() / 6 d where is time. The posterior distribution of compatible model ensemble MSDs calculated by kinisi gives a point estimate for the most probable value of ∗ , given the observed simulation data, and an estimate of the corresponding uncertainty in ̂∗. kinisi also provides equivalent functionality for estimating collective transport coefficients, i.e., jump-diffusion coefficients and ionic conductivities<br/

    Gene duplication and divergence produce divergent MHC genotypes without disassortative mating

    Get PDF
    Genes of the major histocompatibility complex (MHC) exhibit heterozygote advantage in immune defence, which in turn can select for MHC-disassortative mate choice. However, many species lack this expected pattern of MHC-disassortative mating. A possible explanation lies in evolutionary processes following gene duplication: if two duplicated MHC genes become functionally diverged from each other, offspring will inherit diverse multilocus genotypes even under random mating. We used locus-specific primers for high-throughput sequencing of two expressed MHC Class II B genes in Leach\u27s storm-petrels, Oceanodroma leucorhoa, and found that exon 2 alleles fall into two gene-specific monophyletic clades. We tested for disassortative vs. random mating at these two functionally diverged Class II B genes, using multiple metrics and different subsets of exon 2 sequence data. With good statistical power, we consistently found random assortment of mates at MHC. Despite random mating, birds had MHC genotypes with functionally diverged alleles, averaging 13 amino acid differences in pairwise comparisons of exon 2 alleles within individuals. To test whether this high MHC diversity in individuals is driven by evolutionary divergence of the two duplicated genes, we built a phylogenetic permutation model. The model showed that genotypic diversity was strongly impacted by sequence divergence between the most common allele of each gene, with a smaller additional impact of monophyly of the two genes. Divergence of allele sequences between genes may have reduced the benefits of actively seeking MHC-dissimilar mates, in which case the evolutionary history of duplicated genes is shaping the adaptive landscape of sexual selection
    • …
    corecore