Search CORE

607 research outputs found

Tree Edit Distance Learning via Adaptive Symbol Embeddings

Author: Gallicchio Claudio
Hammer Barbara
Micheli Alessio
Paaßen Benjamin
Publication venue
Publication date: 01/01/2018
Field of study

Metric learning has the aim to improve classification accuracy by learning a distance measure which brings data points from the same class closer together and pushes data points from different classes further apart. Recent research has demonstrated that metric learning approaches can also be applied to trees, such as molecular structures, abstract syntax trees of computer programs, or syntax trees of natural language, by learning the cost function of an edit distance, i.e. the costs of replacing, deleting, or inserting nodes in a tree. However, learning such costs directly may yield an edit distance which violates metric axioms, is challenging to interpret, and may not generalize well. In this contribution, we propose a novel metric learning approach for trees which we call embedding edit distance learning (BEDL) and which learns an edit distance indirectly by embedding the tree nodes as vectors, such that the Euclidean distance between those vectors supports class discrimination. We learn such embeddings by reducing the distance to prototypical trees from the same class and increasing the distance to prototypical trees from different classes. In our experiments, we show that BEDL improves upon the state-of-the-art in metric learning for trees on six benchmark data sets, ranging from computer science over biomedical data to a natural-language processing data set containing over 300,000 nodes.Comment: Paper at the International Conference of Machine Learning (2018), 2018-07-10 to 2018-07-15 in Stockholm, Swede

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

A survey of kernel and spectral methods for clustering

Author: Aizerman
Aronszajn
Belkin
Bengio
Bezdek
Bishop
Burges
Camastra
Chan
Chen
Chiang
Cortes
Cristianini
Cristianini
Dhillon
Dhillon
Donath
Duda
Fiedler
Fisher
Francesco Camastra
Francesco Masulli
Gersho
Girolami
Golub
Have
Horn
Huber
Hur
Jain
Kernighan
Kluger
Kohonen
Kohonen
Krishnapuram
Krishnapuram
Kulis
Lee
Leski
Linde
Lloyd
Martinetz
Maurizio Filippone
Mercer
Müller
Ng
Ritter
Rose
Roth
Roweis
Saitoh
Schölkopf
Schölkopf
Shi
Sigillito
Sneath
Stefano Rovetta
Tax
Vapnik
von Luxburg
Ward
Weston
Wolberg
Xu
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., K-means, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel K-means clustering algorithm. (C) 2007 Pattem Recognition Society. Published by Elsevier Ltd. All rights reserved

CiteSeerX

Archivio della ricerca - Università degli studi di Napoli "Parthenope"

Crossref

Enlighten

Archivio istituzionale della ricerca - Università di Genova

White Rose Research Online

Scalable embedding of multiple perspectives for indefinite life-science data analysis

Author: Heilig Simon
Munch Maximilian
Schleif Frank Michael
Vath Philipp
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/01/2021
Field of study

Life science data analysis frequently encounters particular challenges that cannot be solved with classical techniques from data analytics or machine learning domains. The complex inherent structure of the data and especially the encoding in non-standard ways, e.g., as genome- or protein-sequences, graph structure or histograms, often limit the development of appropriate classification models. To address these limitations, the application of domain-specific expert similarity measures has gained a lot of attention in the past. However, the use of such expert measures suffers from two major drawbacks: (a) there is not one outstanding similarity measure that guarantees success in all application scenarios, and (b) such similarity functions often lead to indefinite data that cannot be processed by classical machine learning methods. In order to tackle both of these limitations, this paper presents a method to embed indefinite life science data with various similarity measures at the same time into a complex-valued vector space. We test our approach on various life science data sets and evaluate the performance against other competitive methods to show its efficiency

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Local protein structure prediction using discriminative models

Author: Lengauer Thomas
Sander Oliver
Sommer Ingolf
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: In recent years protein structure prediction methods using local structure information have shown promising improvements. The quality of new fold predictions has risen significantly and in fold recognition incorporation of local structure predictions led to improvements in the accuracy of results. We developed a local structure prediction method to be integrated into either fold recognition or new fold prediction methods. For each local sequence window of a protein sequence the method predicts probability estimates for the sequence to attain particular local structures from a set of predefined local structure candidates. The first step is to define a set of local structure representatives based on clustering recurrent local structures. In the second step a discriminative model is trained to predict the local structure representative given local sequence information. RESULTS: The step of clustering local structures yields an average RMSD quantization error of 1.19 Å for 27 structural representatives (for a fragment length of 7 residues). In the prediction step the area under the ROC curve for detection of the 27 classes ranges from 0.68 to 0.88. CONCLUSION: The described method yields probability estimates for local protein structure candidates, giving signals for all kinds of local structure. These local structure predictions can be incorporated either into fold recognition algorithms to improve alignment quality and the overall prediction accuracy or into new fold prediction methods

Springer - Publisher Connector

PubMed Central

MPG.PuRe

Persistence Bag-of-Words for Topological Data Analysis

Author: Dłotko Paweł
Juda Mateusz
Lipiński Michał
Zeppelzauer Matthias
Zieliński Bartosz
Publication venue
Publication date: 01/01/2019
Field of study

Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs). PDs exhibit, however, complex structure and are difficult to integrate in today's machine learning workflows. This paper introduces persistence bag-of-words: a novel and stable vectorized representation of PDs that enables the seamless integration with machine learning. Comprehensive experiments show that the new representation achieves state-of-the-art performance and beyond in much less time than alternative approaches.Comment: Accepted for the Twenty-Eight International Joint Conference on Artificial Intelligence (IJCAI-19). arXiv admin note: substantial text overlap with arXiv:1802.0485

arXiv.org e-Print Archive

Crossref

Jagiellonian Univeristy Repository

When loss-of-function is loss of function: assessing mutational signatures and impact of loss-of-function genetic variants

Author: Altschul
Blanchard
Bonnefond
Clark
Clark
Cline
Dalkilic
David N Cooper
de Ligt
De Rubeis
Denis
Depienne
Dinkel
Douville
Elkan
Epi4K Consortium and Epilepsy Phenome/Genome Project
EuroEPINOMICS-RES Consortium Epilepsy Phenome/Genome Project and Epi4K Consortium
Folkman
Fromer
Gilissen
Girard
Guan Ning Lin
Guipponi
Gulsuner
Hashimoto
Hsiao
Hu
Hyun-Jun Nam
Iossifov
Iossifov
Jain
Jain
Jiang
Jonathan Sebat
Karolchik
Kircher
Kong
Kymberleigh A Pagel
Landrum
Lek
Li
Lilia M Iakoucheva
MacArthur
MacArthur
Maquat
Matthew Mort
McCarthy
Menon
Mort
Mushegian
Neale
Ng
O’Roak
O’Roak
O’Roak
Pei
Pejaver
Peng
Predrag Radivojac
Radivojac
Radivojac
Ramachandrappa
Rauch
Rausell
Riedmiller
Risso
Ronemus
Rost
Sanders
Sean D Mooney
Sigrist
Stenson
Sulem
Suzek
Thousand Genomes Project Consortium
Turner
Vikas Pejaver
Xu
Xu
Yuen
Yuen
Zia
Publication venue: 'Oxford University Press (OUP)'
Publication date: 12/07/2017
Field of study

Crossref

Online Research @ Cardiff