514,587 research outputs found

    Similarity Learning via Kernel Preserving Embedding

    Full text link
    Data similarity is a key concept in many data-driven applications. Many algorithms are sensitive to similarity measures. To tackle this fundamental problem, automatically learning of similarity information from data via self-expression has been developed and successfully applied in various models, such as low-rank representation, sparse subspace learning, semi-supervised learning. However, it just tries to reconstruct the original data and some valuable information, e.g., the manifold structure, is largely ignored. In this paper, we argue that it is beneficial to preserve the overall relations when we extract similarity information. Specifically, we propose a novel similarity learning framework by minimizing the reconstruction error of kernel matrices, rather than the reconstruction error of original data adopted by existing work. Taking the clustering task as an example to evaluate our method, we observe considerable improvements compared to other state-of-the-art methods. More importantly, our proposed framework is very general and provides a novel and fundamental building block for many other similarity-based tasks. Besides, our proposed kernel preserving opens up a large number of possibilities to embed high-dimensional data into low-dimensional space.Comment: Published in AAAI 201

    Ranked List Loss for Deep Metric Learning

    Full text link
    The objective of deep metric learning (DML) is to learn embeddings that can capture semantic similarity and dissimilarity information among data points. Existing pairwise or tripletwise loss functions used in DML are known to suffer from slow convergence due to a large proportion of trivial pairs or triplets as the model improves. To improve this, ranking-motivated structured losses are proposed recently to incorporate multiple examples and exploit the structured information among them. They converge faster and achieve state-of-the-art performance. In this work, we unveil two limitations of existing ranking-motivated structured losses and propose a novel ranked list loss to solve both of them. First, given a query, only a fraction of data points is incorporated to build the similarity structure. Consequently, some useful examples are ignored and the structure is less informative. To address this, we propose to build a set-based similarity structure by exploiting all instances in the gallery. The learning setting can be interpreted as few-shot retrieval: given a mini-batch, every example is iteratively used as a query, and the rest ones compose the gallery to search, i.e., the support set in few-shot setting. The rest examples are split into a positive set and a negative set. For every mini-batch, the learning objective of ranked list loss is to make the query closer to the positive set than to the negative set by a margin. Second, previous methods aim to pull positive pairs as close as possible in the embedding space. As a result, the intraclass data distribution tends to be extremely compressed. In contrast, we propose to learn a hypersphere for each class in order to preserve useful similarity structure inside it, which functions as regularisation. Extensive experiments demonstrate the superiority of our proposal by comparing with the state-of-the-art methods.Comment: Accepted to T-PAMI. Therefore, to read the offical version, please go to IEEE Xplore. Fine-grained image retrieval task. Our source code is available online: https://github.com/XinshaoAmosWang/Ranked-List-Loss-for-DM

    Kernel learning for ligand-based virtual screening: discovery of a new PPARgamma agonist

    Get PDF
    Poster presentation at 5th German Conference on Cheminformatics: 23. CIC-Workshop Goslar, Germany. 8-10 November 2009 We demonstrate the theoretical and practical application of modern kernel-based machine learning methods to ligand-based virtual screening by successful prospective screening for novel agonists of the peroxisome proliferator-activated receptor gamma (PPARgamma) [1]. PPARgamma is a nuclear receptor involved in lipid and glucose metabolism, and related to type-2 diabetes and dyslipidemia. Applied methods included a graph kernel designed for molecular similarity analysis [2], kernel principle component analysis [3], multiple kernel learning [4], and, Gaussian process regression [5]. In the machine learning approach to ligand-based virtual screening, one uses the similarity principle [6] to identify potentially active compounds based on their similarity to known reference ligands. Kernel-based machine learning [7] uses the "kernel trick", a systematic approach to the derivation of non-linear versions of linear algorithms like separating hyperplanes and regression. Prerequisites for kernel learning are similarity measures with the mathematical property of positive semidefiniteness (kernels). The iterative similarity optimal assignment graph kernel (ISOAK) [2] is defined directly on the annotated structure graph, and was designed specifically for the comparison of small molecules. In our virtual screening study, its use improved results, e.g., in principle component analysis-based visualization and Gaussian process regression. Following a thorough retrospective validation using a data set of 176 published PPARgamma agonists [8], we screened a vendor library for novel agonists. Subsequent testing of 15 compounds in a cell-based transactivation assay [9] yielded four active compounds. The most interesting hit, a natural product derivative with cyclobutane scaffold, is a full selective PPARgamma agonist (EC50 = 10 ± 0.2 microM, inactive on PPARalpha and PPARbeta/delta at 10 microM). We demonstrate how the interplay of several modern kernel-based machine learning approaches can successfully improve ligand-based virtual screening results

    Integrating Segmentation and Similarity in Melodic Analysis

    Get PDF
    The recognition of melodic structure depends on both the segmentation into structural units, the melodic motifs, and relations of motifs which are mainly determined by similarity. Existing models and studies of segmentation and motivic similarity cover only certain aspects and do not provide a comprehensive or coherent theory. In this paper an Integrated Segmentation and Similarity Model (ISSM) for melodic analysis is introduced. The ISSM yields an interpretation similar to a paradigmatic analysis for a given melody. An interpretation comprises a segmentation, assignments of related motifs and notes, and detailed information on the differences of assigned motifs and notes. The ISSM is based on generating and rating interpretations to find the most adequate one. For this rating a neuro-fuzzy-system is used, which combines knowledge with learning from data. The ISSM is an extension of a system for rhythm analysis. This paper covers the model structure and the features relevant for melodic and motivic analysis. Melodic segmentation and similarity ratings are described and results of a small experiment which show that the ISSM can learn structural interpretations from data and that integrating similarity improves segmentation performance of the model
    corecore