27 research outputs found

    Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations

    Full text link
    T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the "crowding problem" of SNE. Here, we develop an efficient implementation of t-SNE for a tt-distribution kernel with an arbitrary degree of freedom ν\nu, with ν\nu\to\infty corresponding to SNE and ν=1\nu=1 corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that ν<1\nu<1 can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data

    MOBAS: identification of disease-associated protein subnetworks using modularity-based scoring

    Get PDF
    Network-based analyses are commonly used as powerful tools to interpret the findings of genome-wide association studies (GWAS) in a functional context. In particular, identification of disease-associated functional modules, i.e., highly connected protein-protein interaction (PPI) subnetworks with high aggregate disease association, are shown to be promising in uncovering the functional relationships among genes and proteins associated with diseases. An important issue in this regard is the scoring of subnetworks by integrating two quantities: disease association of individual gene products and network connectivity among proteins. Current scoring schemes either disregard the level of connectivity and focus on the aggregate disease association of connected proteins or use a linear combination of these two quantities. However, such scoring schemes may produce arbitrarily large subnetworks which are often not statistically significant or require tuning of parameters that are used to weigh the contributions of network connectivity and disease association. Here, we propose a parameter-free scoring scheme that aims to score subnetworks by assessing the disease association of interactions between pairs of gene products. We also incorporate the statistical significance of network connectivity and disease association into the scoring function. We test the proposed scoring scheme on a GWAS dataset for two complex diseases type II diabetes (T2D) and psoriasis (PS). Our results suggest that subnetworks identified by commonly used methods may fail tests of statistical significance after correction for multiple hypothesis testing. In contrast, the proposed scoring scheme yields highly significant subnetworks, which contain biologically relevant proteins that cannot be identified by analysis of genome-wide association data alone. We also show that the proposed scoring scheme identifies subnetworks that are reproducible across different cohorts, and it can robustly recover relevant subnetworks at lower sampling rates

    Black root rot: a long known but little understood disease

    No full text
    Table S1. Hosts reported to be susceptible to black root rot infection.Table S2. Variation in host susceptibility to black root rot infection by the fungus formally known as Thielaviopsis basicola.Black root rot caused by the pathogen Thielaviopsis basicola has been known since the mid 1800s. The disease is important on many agricultural and ornamental plant species and has been found in at least 31 countries. Since its description, the pathogen has had a complex taxonomic history that has resulted in a confused literature. A recent revision of the Ceratocystidaceae following the advent of DNA sequencing technology has made it possible to resolve this confusion. Importantly, it has also shown that there are two pathogens in the Ceratocystidaceae that cause black root rot. They reside in the newly established genus Berkeleyomyces and are now known as B. basicola and B. rouxiae. This review considers the taxonomic history of the black root rot pathogens, and their global distribution. Prospects relating to the serious diseases that they cause and the likely impact that the era of genomics will have on our understanding of the pathogens are also highlighted.The University of Pretoria, the members of Tree Protection Co‐operative Programme (TPCP), the DST‐NRF Centre of Excellence in Tree Health Biotechnology (CTHB) and the National Research Foundation.https://onlinelibrary.wiley.com/journal/136530592020-06-01hj2019BiochemistryForestry and Agricultural Biotechnology Institute (FABI)GeneticsMicrobiology and Plant Patholog
    corecore