176 research outputs found

    Perplexity-free Parametric t-SNE

    Full text link
    The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm is a ubiquitously employed dimensionality reduction (DR) method. Its non-parametric nature and impressive efficacy motivated its parametric extension. It is however bounded to a user-defined perplexity parameter, restricting its DR quality compared to recently developed multi-scale perplexity-free approaches. This paper hence proposes a multi-scale parametric t-SNE scheme, relieved from the perplexity tuning and with a deep neural network implementing the mapping. It produces reliable embeddings with out-of-sample extensions, competitive with the best perplexity adjustments in terms of neighborhood preservation on multiple data sets.Comment: ESANN 2020 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Online event, 2-4 October 2020, i6doc.com publ., ISBN 978-2-87587-074-2. Available from http://www.i6doc.com/en

    Optimizing graph layout by t-SNE perplexity estimation

    Full text link
    AbstractPerplexity is one of the key parameters of dimensionality reduction algorithm of t-distributed stochastic neighbor embedding (t-SNE). In this paper, we investigated the relationship of t-SNE perplexity and graph layout evaluation metrics including graph stress, preserved neighborhood information and visual inspection. As we found that a small perplexity is correlated with a relative higher normalized stress while preserving neighborhood information with a higher precision but less global structure information, we proposed our method to estimate appropriate perplexity either based on a modified standard t-SNE or the sklearn Barnes–Hut TSNE. Experimental results demonstrate effectiveness and ease of use of our approach when tested on a set of benchmark datasets.</jats:p

    Unmasking Clever Hans Predictors and Assessing What Machines Really Learn

    Full text link
    Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to well-informed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.Comment: Accepted for publication in Nature Communication

    Perplexity-free t-SNE and twice Student tt-SNE

    No full text
    In dimensionality reduction and data visualisation, t-SNE has become a popular method. In this paper, we propose two variants to the Gaussian similarities used to characterise the neighbourhoods around each high-dimensional datum in t-SNE. A first alternative is to use t distributions like already used in the low-dimensional embedding space; a variable degree of freedom accounts for the intrinsic dimensionality of data. The second variant relies on compounds of Gaussian neighbourhoods with growing widths, thereby suppressing the need for the user to adjust a single size or perplexity. In both cases, heavy-tailed distributions thus characterise the neighbourhood relationships in the data space. Experiments show that both variants are competitive with t-SNE, at no extra cost

    Inhibitor selectivity: profiling and prediction

    Get PDF
    Less than 1 in 10 drug candidates that enter phase 1 clinical trials actually gets approved for human use. The high failure rate is in part due to unforeseen side effects or toxicity. A better understanding of the role of selectivity and a better insight in the off-target activities of drug candidates could greatly aid in preventing candidates to fail for these reasons. This thesis has tried to address some aspects in this challenging part of drug discovery. The use of activity-based protein profiling as presented in Chapters 2 and 3 in drug discovery and hit-to-lead optimization, and in Chapter 5 and 6 for the interaction profiling of a drug candidate, highlights the versatility and importance of this chemical biology technique. Combined with knowledge derived from biochemical assays, such as that developed in Chapter 4, ABPP can greatly aid the medicinal chemist. The recent surge in popularity of machine learning algorithms, backed by exponential growth of the amount of biological data available, holds great promise for drug discovery. Chapters 7 and 8 showed the applicability of one such algorithm, which was able to quite reliably predict interaction profiles. The challenges in finding, determining and predicting selectivity are far from solved, but, by incrementally expanding our understanding of the binding of small molecules to their (off-)targets, truly selective inhibitors might at some point become a reality or their necessity might be mitigated.Medical Biochemistr

    Analysing and comparing problem landscapes for black-box optimization via length scale

    Get PDF
    corecore