Search CORE

1,457 research outputs found

Partitioning Relational Matrices of Similarities or Dissimilarities using the Value of Information

Author: Principe Jose C.
Sledge Isaac J.
Publication venue
Publication date: 27/10/2017
Field of study

In this paper, we provide an approach to clustering relational matrices whose entries correspond to either similarities or dissimilarities between objects. Our approach is based on the value of information, a parameterized, information-theoretic criterion that measures the change in costs associated with changes in information. Optimizing the value of information yields a deterministic annealing style of clustering with many benefits. For instance, investigators avoid needing to a priori specify the number of clusters, as the partitions naturally undergo phase changes, during the annealing process, whereby the number of clusters changes in a data-driven fashion. The global-best partition can also often be identified.Comment: Submitted to the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP

arXiv.org e-Print Archive

Crossref

Supervised Classification: Quite a Brief Overview

Author: Aizerman
Ben-David
Besag
Beygelzimer
Bishop
Boser
Bottou
Bradley
Braga-Neto
Breiman
Breiman
Carbonneau
Chandola
Chapelle
Cheplygina
Chow
Christianini
Cohen
Cohn
Cortes
Cortes
Cover
Devroye
Dietterich
Dietterich
Dubuisson
Duda
Duda
Duin
Duin
Duin
Duin
Dwork
Dwork
Efron
Efron
Efron
Fanelli
Fawcett
Fedorov
Fisher
Fix
Freund
Fu
Galar
Geman
Girosi
Guyon
Hand
Hand
Hastie
Hinton
Ho
Ho
Hoerl
Hoffgen
Ioannidis
Isaksson
Jahrer
Jain
Jain
Kahneman
Krijthe
Kuncheva
Lachenbruch
Lafferty
Landgrebe
Langley
Lavrač
Leek
Levine
Li
Li
Li
Li
Little
Loog
Loog
Loog
Loog
Loog
Markou
Maron
McLachlan
Minka
Moonesinghe
Nair
Niemeijer
Nissen
Pan
Poggio
Polikar
Provost
Pękalska
Pękalska
Pękalska
Quinlan
Quiñonero-Candela
Rasmussen
Ripley
Rosenblatt
Rubinstein
Schaffer
Schiavo
Schmidhuber
Schölkopf
Schölkopf
Schölkopf
Settles
Shrivastava
Smola
Suykens
Tax
Tibshirani
Vapnik
Wahba
Wahba
Wahba
Wald
White
Wolpert
Wolpert
Wolpert
Yang
Zhou
Zhu
Publication venue
Publication date: 25/10/2017
Field of study

The original problem of supervised classification considers the task of automatically assigning objects to their respective classes on the basis of numerical measurements derived from these objects. Classifiers are the tools that implement the actual functional mapping from these measurements---also called features or inputs---to the so-called class label---or output. The fields of pattern recognition and machine learning study ways of constructing such classifiers. The main idea behind supervised methods is that of learning from examples: given a number of example input-output relations, to what extent can the general mapping be learned that takes any new and unseen feature vector to its correct class? This chapter provides a basic introduction to the underlying ideas of how to come to a supervised classification problem. In addition, it provides an overview of some specific classification techniques, delves into the issues of object representation and classifier evaluation, and (very) briefly covers some variations on the basic supervised classification task that may also be of interest to the practitioner

arXiv.org e-Print Archive

Crossref

Adaptive dissimilarity measures, dimension reduction and visualization

Author: Bunte Kerstin
Publication venue: s.n.
Publication date: 01/01/2011
Field of study

ARTS repository - University of Groningen

Adaptive dissimilarity measures, dimension reduction and visualization

Author: Bunte Kerstin
Publication venue: s.n.
Publication date: 01/01/2011
Field of study

Dissertations of the University of Groningen

Adaptive dissimilarity measures, dimension reduction and visualization

Author: Bunte Kerstin
Publication venue: s.n.
Publication date: 01/01/2011
Field of study

Proceedings - University of Groningen

Median topographic maps for biomedical data sets

Author: Biehl M.
Hammer B.
Hammer Barbara
Hasenfuss A.
Rossi F.
Verleysen M.
Villmann T.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Median clustering extends popular neural data analysis methods such as the self-organizing map or neural gas to general data structures given by a dissimilarity matrix only. This offers flexible and robust global data inspection methods which are particularly suited for a variety of data as occurs in biomedical domains. In this chapter, we give an overview about median clustering and its properties and extensions, with a particular focus on efficient implementations adapted to large scale data analysis

arXiv.org e-Print Archive

CiteSeerX

Publications at Bielefeld University

Tree Edit Distance Learning via Adaptive Symbol Embeddings

Author: Gallicchio Claudio
Hammer Barbara
Micheli Alessio
Paaßen Benjamin
Publication venue
Publication date: 01/01/2018
Field of study

Metric learning has the aim to improve classification accuracy by learning a distance measure which brings data points from the same class closer together and pushes data points from different classes further apart. Recent research has demonstrated that metric learning approaches can also be applied to trees, such as molecular structures, abstract syntax trees of computer programs, or syntax trees of natural language, by learning the cost function of an edit distance, i.e. the costs of replacing, deleting, or inserting nodes in a tree. However, learning such costs directly may yield an edit distance which violates metric axioms, is challenging to interpret, and may not generalize well. In this contribution, we propose a novel metric learning approach for trees which we call embedding edit distance learning (BEDL) and which learns an edit distance indirectly by embedding the tree nodes as vectors, such that the Euclidean distance between those vectors supports class discrimination. We learn such embeddings by reducing the distance to prototypical trees from the same class and increasing the distance to prototypical trees from different classes. In our experiments, we show that BEDL improves upon the state-of-the-art in metric learning for trees on six benchmark data sets, ranging from computer science over biomedical data to a natural-language processing data set containing over 300,000 nodes.Comment: Paper at the International Conference of Machine Learning (2018), 2018-07-10 to 2018-07-15 in Stockholm, Swede

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

A dissimilarity representation approach to designing systems for signature verification and bio-cryptography

Author: Ekladious George S. Eskander
Publication venue: École de technologie supérieure
Publication date
Field of study

Automation of legal and financial processes requires enforcing of authenticity, confidentiality, and integrity of the involved transactions. This Thesis focuses on developing offline signature verification (OLSV) systems for enforcing authenticity of transactions. In addition, bio-cryptography systems are developed based on the offline handwritten signature images for enforcing confidentiality and integrity of transactions. Design of OLSV systems is challenging, as signatures are behavioral biometric traits that have intrinsic intra-personal variations and inter-personal similarities. Standard OLSV systems are designed in the feature representation (FR) space, where high-dimensional feature representations are needed to capture the invariance of the signature images. With the numerous users, found in real world applications, e.g., banking systems, decision boundaries in the high-dimensional FR spaces become complex. Accordingly, large number of training samples are required to design of complex classifiers, which is not practical in typical applications. In contrast, design of bio-cryptography systems based on the offline signature images is more challenging. In these systems, signature images lock the cryptographic keys, and a user retrieves his key by applying a query signature sample. For practical bio-cryptographic schemes, the locking feature vector should be concise. In addition, such schemes employ simple error correction decoders, and therefore no complex classification rules can be employed. In this Thesis, the challenging problems of designing OLSV and bio-cryptography systems are addressed by employing the dissimilarity representation (DR) approach. Instead of designing classifiers in the feature space, the DR approach provides a classification space that is defined by some proximity measure. This way, a multi-class classification problem, with few samples per class, is transformed to a more tractable two-class problem with large number of training samples. Since many feature extraction techniques have already been proposed for OLSV applications, a DR approach based on FR is employed. In this case, proximity between two signatures is measured by applying a dissimilarity measure on their feature vectors. The main hypothesis of this Thesis is as follows. The FRs and dissimilarity measures should be properly designed, so that signatures belong to same writer are close, while signatures of different writers are well separated in the resulting DR spaces. In that case, more cost-effecitive classifiers, and therefore simpler OLSV and bio-cryptography systems can be designed. To this end, in Chapter 2, an approach for optimizing FR-based DR spaces is proposed such that concise representations are discriminant, and simple classification thresholds are sufficient. High-dimensional feature representations are translated to an intermediate DR space, where pairwise feature distances are the space constituents. Then, a two-step boosting feature selection (BFS) algorithm is applied. The first step uses samples from a development database, and aims to produce a universal space of reduced dimensionality. The resulting universal space is further reduced and tuned for specific users through a second BFS step using user-specific training set. In the resulting space, feature variations are modeled and an adaptive dissimilarity measure is designed. This measure generates the final DR space, where discriminant prototypes are selected for enhanced representation. The OLSV and bio-cryptographic systems are formulated as simple threshold classifiers that operate in the designed DR space. Proof of concept simulations on the Brazilian signature database indicate the viability of the proposed approach. Concise DRs with few features and a single prototype are produced. Employing a simple threshold classifier, the DRs have shown state-of-the-art accuracy of about 7% AER, comparable to complex systems in the literature. In Chapter 3, the OLSV problem is further studied. Although the aforementioned OLSV implementation has shown acceptable recognition accuracy, the resulting systems are not secure as signature templates must be stored for verification. For enhanced security, we modified the previous implementation as follows. The first BFS step is implemented as aforementioned, producing a writer-independent (WI) system. This enables starting system operation, even if users provide a single signature sample in the enrollment phase. However, the second BFS is modified to run in a FR space instead of a DR space, so that no signature templates are used for verification. To this end, the universal space is translated back to a FR space of reduced dimensionality, so that designing a writer-dependent (WD) system by the few user-specific samples is tractable in the reduced space. Simulation results on two real-world offline signature databases confirm the feasibility of the proposed approach. The initial universal (WI) verification mode showed comparable performance to that of state-of-the-art OLSV systems. The final secure WD verification mode showed enhanced accuracy with decreased computational complexity. Only a single compact classifier produced similar level of accuracy (AER of about 5.38 and 13.96% for the Brazilian and the GPDS signature databases, respectively) as complex WI and WD systems in the literature. Finally, in Chapter 4, a key-binding bio-cryptographic scheme known as the fuzzy vault (FV) is implemented based on the offline signature images. The proposed DR-based two-step BFS technique is employed for selecting a compact and discriminant user-specific FR from a large number of feature extractions. This representation is used to generate the FV locking/unlocking points. Representation variability modeled in the DR space is considered for matching the unlocking and locking points during FV decoding. Proof of concept simulations on the Brazilian signature database have shown FV recognition accuracy of 3% AER and system entropy of about 45-bits. For enhanced security, an adaptive chaff generation method is proposed, where the modeled variability controls the chaff generation process. Similar recognition accuracy is reported, where more enhanced entropy of about 69-bits is achieved

Espace ÉTS

Explanation of Siamese Neural Networks for Weakly Supervised Learning

Author: Kasimov Ernest
Kovalev Maxim
Utkin Lev
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 20/05/2021
Field of study

A new method for explaining the Siamese neural network (SNN) as a black-box model for weakly supervised learning is proposed under condition that the output of every subnetwork of the SNN is a vector which is accessible. The main problem of the explanation is that the perturbation technique cannot be used directly for input instances because only their semantic similarity or dissimilarity is known. Moreover, there is no an "inverse" map between the SNN output vector and the corresponding input instance. Therefore, a special autoencoder is proposed, which takes into account the proximity of its hidden representation and the SNN outputs. Its pre-trained decoder part as well as the encoder are used to reconstruct original instances from the SNN perturbed output vectors. The important features of the explained instances are determined by averaging the corresponding changes of the reconstructed instances. Numerical experiments with synthetic data and with the well-known dataset MNIST illustrate the proposed method

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)