Search CORE

20 research outputs found

Study and Observation of the Variation of Accuracies of KNN, SVM, LMNN, ENN Algorithms on Eleven Different Datasets from UCI Machine Learning Repository

Author: Arif Rezoana Bente
Khan Mohammad Mahmudur Rahman
Oishe Mahjabin Rahman
Siddique Md. Abu Bakr
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/11/2018
Field of study

Machine learning qualifies computers to assimilate with data, without being solely programmed [1, 2]. Machine learning can be classified as supervised and unsupervised learning. In supervised learning, computers learn an objective that portrays an input to an output hinged on training input-output pairs [3]. Most efficient and widely used supervised learning algorithms are K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Large Margin Nearest Neighbor (LMNN), and Extended Nearest Neighbor (ENN). The main contribution of this paper is to implement these elegant learning algorithms on eleven different datasets from the UCI machine learning repository to observe the variation of accuracies for each of the algorithms on all datasets. Analyzing the accuracy of the algorithms will give us a brief idea about the relationship of the machine learning algorithms and the data dimensionality. All the algorithms are developed in Matlab. Upon such accuracy observation, the comparison can be built among KNN, SVM, LMNN, and ENN regarding their performances on each dataset.Comment: To be published in the 4th IEEE International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT 2018

arXiv.org e-Print Archive

Crossref

An Efficient Dual Approach to Distance Metric Learning

Author: Hengel Anton van den
Kim Junae
Liu Fayao
Shen Chunhua
Wang Lei
Publication venue
Publication date: 13/02/2013
Field of study

Distance metric learning is of fundamental interest in machine learning because the distance metric employed can significantly affect the performance of many learning methods. Quadratic Mahalanobis metric learning is a popular approach to the problem, but typically requires solving a semidefinite programming (SDP) problem, which is computationally expensive. Standard interior-point SDP solvers typically have a complexity of

O(D^{6.5})

(with

D

the dimension of input data), and can thus only practically solve problems exhibiting less than a few thousand variables. Since the number of variables is

D (D+1) / 2

, this implies a limit upon the size of problem that can practically be solved of around a few hundred dimensions. The complexity of the popular quadratic Mahalanobis metric learning approach thus limits the size of problem to which metric learning can be applied. Here we propose a significantly more efficient approach to the metric learning problem based on the Lagrange dual formulation of the problem. The proposed formulation is much simpler to implement, and therefore allows much larger Mahalanobis metric learning problems to be solved. The time complexity of the proposed method is

O (D ^ 3)

, which is significantly lower than that of the SDP approach. Experiments on a variety of datasets demonstrate that the proposed method achieves an accuracy comparable to the state-of-the-art, but is applicable to significantly larger problems. We also show that the proposed method can be applied to solve more general Frobenius-norm regularized SDP problems approximately

arXiv.org e-Print Archive

Crossref

Adelaide Research & Scholarship

Research Online

Using Asymmetric Classification Cost Matrices in Predicting Diabetes

Author: Ghosh Bishwadip
Hasley Joseph
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2007
Field of study

Often there is a need to introduce classification costs into the classifier for predicting disease. This is determined by the type of disease, its associated classification cost matrix and/or the target population on which the classifier will be used. Diabetes has higher costs associated with false negatives than true positives, as the disease can progress very rapidly when left untreated. There are two ways to skew a classifier to work towards the given classification cost matrix: (1) by changing the classification probability value, P* based on the classification cost matrix or (2) by rebalancing the training set to introduce more negative cases. Using a diabetes data set, this paper compares the two methods. The results indicate comparable values of predictive accuracy and expected classification costs for either method. However, P* works better when the p-value is less than 0.2. Hence for diabetes classification matrices, the P* method is recommended

AIS Electronic Library (AISeL)

The Positive Impact of Metric Learning on Open Set Nearest Neighbor Classification

Author: Badewitz Wolfgang
Grote Alexander
Knierim Michael Thomas
Weinhardt Christof
Publication venue
Publication date: 03/01/2024
Field of study

ScholarSpace at University of Hawai'i at Manoa

Exploiting diversity for optimizing margin distribution in ensemble learning

Author: Daren Yu (7168754)
Gerald Schaefer (1258191)
Leijun Li (7168751)
Qinghua Hu (251686)
Xiangqian Wu (715702)
Publication venue
Publication date: 01/01/2014
Field of study

Margin distribution is acknowledged as an important factor for improving the generalization performance of classifiers. In this paper, we propose a novel ensemble learning algorithm named Double Rotation Margin Forest (DRMF), that aims to improve the margin distribution of the combined system over the training set. We utilise random rotation to produce diverse base classifiers, and optimize the margin distribution to exploit the diversity for producing an optimal ensemble. We demonstrate that diverse base classifiers are beneficial in deriving large-margin ensembles, and that therefore our proposed technique will lead to good generalization performance. We examine our method on an extensive set of benchmark classification tasks. The experimental results confirm that DRMF outperforms other classical ensemble algorithms such as Bagging, AdaBoostM1 and Rotation Forest. The success of DRMF is explained from the viewpoints of margin distribution and diversity

Loughborough University Institutional Repository

Generic probabilistic prototype based classification of vectorial and proximity data

Author: Schleif Frank-michael
Publication venue: 'Elsevier BV'
Publication date: 01/04/2015
Field of study

Crossref

University of Birmingham Research Portal

Composite Kernel Optimization in Semi-Supervised Metric

Author: H. R. Abutalebi
J. Kittler
M. T. Sadeghi
T. Zare
Publication venue: 'International Ocean Discovery Program (IODP)'
Publication date: 01/07/2017
Field of study

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the topic of metric learning, especially using kernel functions, which map data to feature spaces with enhanced class separability, and implicitly define a new metric in the original feature space. The formulation of the problem of metric learning depends on the supervisory information available for the task. In this paper, we focus on semi-supervised kernel based distance metric learning where the training data set is unlabelled, with the exception of a small subset of pairs of points labelled as belonging to the same class (cluster) or different classes (clusters). The proposed method involves creating a pool of kernel functions. The corresponding kernels matrices are first clustered to remove redundancy in representation. A composite kernel constructed from the kernel clustering result is then expanded into an orthogonal set of basis functions. The mixing parameters of this expansion are then optimised using point similarity and dissimilarity information conveyed by the labels. The proposed method is evaluated on synthetic and real data sets. The results show the merit of using similarity and dissimilarity information jointly as compared to using just the similarity information, and the superiority of the proposed method over all the recently introduced metric learning approaches

Directory of Open Access Journals