Search CORE

53 research outputs found

Semi-supervised Naive Hubness-Bayesian k-Nearest Neighbor for Gene Expression Data

Author: Buza Krisztián Antal
Publication venue: Springer International Publishing
Publication date: 01/01/2015
Field of study

One-class classifiers based on entropic spanning graphs

Author: Alippi Cesare
Livi Lorenzo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/08/2016
Field of study

One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach takes into account the possibility to process also non-numeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the

\alpha

-Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graph-based fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN, Vancouver, Canad

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Open Research Exeter

Classification of Electroencephalograph Data: A Hubness-aware Approach

Author: Buza Krisztián Antal
Koller Júlia
Publication venue: Óbudai Egyeten
Publication date: 01/01/2016
Field of study

Classification of electroencephalograph (EEG) data is the common denominator in various recognition tasks related to EEG signals. Automated recognition systems are especially useful in cases when continuous, long-term EEG is recorded and the resulting data, due to its huge amount, cannot be analyzed by human experts in depth. EEG-related recognition tasks may support medical diagnosis and they are core components of EEGcontrolled devices such as web browsers or spelling devices for paralyzed patients. Stateof-the-art solutions are based on machine learning. In this paper, we show that EEG datasets contain hubs, i.e., signals that appear as nearest neighbors of surprisingly many signals. This paper is the first to document this observation for EEG datasets. Next, we argue that the presence of hubs has to be taken into account for the classification of EEG signals, therefore, we adapt hubness-aware classifiers to EEG data. Finally, we present the results of our empirical study on a large, publicly available collection of EEG signals and show that hubness-aware classifiers outperform the state-of-the-art time-series classifier

Repository of the Academy's Library

Correcting the Hub Occurrence Prediction Bias in Many Dimensions

Author: Dunja Mladenic
Krisztian Buza
Tomasev Nenad
Publication venue
Publication date: 01/01/2016
Field of study

Data reduction is a common pre-processing step for k-nearest neighbor classification (kNN). The existing prototype selection methods implement different criteria for selecting relevant points to use in classification, which constitutes a selection bias. This study examines the nature of the instance selection bias in intrinsically high-dimensional data. In high-dimensional feature spaces, hubs are known to emerge as centers of influence in kNN classification. These points dominate most kNN sets and are often detrimental to classification performance. Our experiments reveal that different instance selection strategies bias the predictions of the behavior of hub-points in high-dimensional data in different ways. We propose to introduce an intermediate un-biasing step when training the neighbor occurrence models and we demonstrate promising improvements in various hubness-aware classification methods, on a wide selection of high-dimensional synthetic and real-world datasets

Repository of the Academy's Library

Hubness-aware kNN classification of high-dimensional data in presence of label noise

Author: Angluin
Aucouturier
Bellman
Bootkrajang
Brodley
Frenay
Georgios
Guan
Ipeirotis
Karmaker
Keller
Khoshgoftaar
Ko
Krisztian Buza
Long
Lowe
Nenad Tomašev
Radovanović
Tan
Tomašev
Tomašev
Tomašev
Wang
Yu
Zeng
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Crossref

Repository of the Academy's Library

Semmelweis Repository

Classification of Gene Expression Data: A Hubness-aware Semi-supervised Approach

Author: Buza Krisztian
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Background and Objective. Classification of gene expression data is the common denominator of various biomedical recognition tasks. However, obtaining class labels for large training samples may be difficult or even impossible in many cases. Therefore, semi-supervised classification techniques are required as semi-supervised classifiers take advantage of unlabeled data. Methods. Gene expression data is high-dimensional which gives rise to the phenomena known under the umbrella of the curse of dimensionality, one of its recently explored aspects being the presence of hubs or hubness for short. Therefore, hubness-aware classifiers have been developed recently, such as Naive Hubness-Bayesian k-Nearest Neighbor (NHBNN). In this paper, we propose a semi-supervised extension of NHBNN which follows the self-training schema. As one of the core components of self-training is the certainty score, we propose a new hubness-aware certainty score. Results. We performed experiments on publicly available gene expression data. These experiments show that the proposed classifier outperforms its competitors. We investigated the impact of each of the components (classification algorithm, semi-supervised technique, hubness-aware certainty score) separately and showed that each of these components are relevant to the performance of the proposed approach. Conclusions. Our results imply that our approach may increase classification accuracy and reduce computational costs (i.e., runtime). Based on the promising results presented in the paper, we envision that hubness-aware techniques will be used in various other biomedical machine learning tasks. In order to accelerate this process, we made an implementation of hubness-aware machine learning techniques publicly available in the PyHubs software package (http://www.biointelligence.hu/pyhubs) implemented in Python, one of the most popular programming languages of data science

Repository of the Academy's Library

Exploring and exploiting hubness priors for high-quality GAN latent sampling

Author: Lai Yukun
Liang Yuanbang
Qin Yipeng
Wu Jing
Publication venue
Publication date
Field of study

Online Research @ Cardiff

Acta Polytechnica Hungarica 2016

Author
Publication venue: Óbudai Egyetem, IEEE Hungary
Publication date: 01/01/2016
Field of study

REAL-J