Search CORE

3,317 research outputs found

Local feature weighting in nearest prototype classification

Author: Fernández Fernando
Isasi Pedro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

The distance metric is the corner stone of nearest neighbor (NN)-based methods, and therefore, of nearest prototype (NP) algorithms. That is because they classify depending on the similarity of the data. When the data is characterized by a set of features which may contribute to the classification task in different levels, feature weighting or selection is required, sometimes in a local sense. However, local weighting is typically restricted to NN approaches. In this paper, we introduce local feature weighting (LFW) in NP classification. LFW provides each prototype its own weight vector, opposite to typical global weighting methods found in the NP literature, where all the prototypes share the same one. Providing each prototype its own weight vector has a novel effect in the borders of the Voronoi regions generated: They become nonlinear. We have integrated LFW with a previously developed evolutionary nearest prototype classifier (ENPC). The experiments performed both in artificial and real data sets demonstrate that the resulting algorithm that we call LFW in nearest prototype classification (LFW-NPC) avoids overfitting on training data in domains where the features may have different contribution to the classification task in different areas of the feature space. This generalization capability is also reflected in automatically obtaining an accurate and reduced set of prototypes.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

A Survey on Metric Learning for Feature Vectors and Structured Data

Author: Bellet Aurélien
Habrard Amaury
Sebban Marc
Publication venue
Publication date: 01/01/2013
Field of study

The need for appropriate ways to measure the distance or similarity between data is ubiquitous in machine learning, pattern recognition and data mining, but handcrafting such good metrics for specific problems is generally difficult. This has led to the emergence of metric learning, which aims at automatically learning a metric from data and has attracted a lot of interest in machine learning and related fields for the past ten years. This survey paper proposes a systematic review of the metric learning literature, highlighting the pros and cons of each approach. We pay particular attention to Mahalanobis distance metric learning, a well-studied and successful framework, but additionally present a wide range of methods that have recently emerged as powerful alternatives, including nonlinear metric learning, similarity learning and local metric learning. Recent trends and extensions, such as semi-supervised metric learning, metric learning for histogram data and the derivation of generalization guarantees, are also covered. Finally, this survey addresses metric learning for structured data, in particular edit distance learning, and attempts to give an overview of the remaining challenges in metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new method

arXiv.org e-Print Archive

HAL-UJM

Scalable Feature Selection Using ReliefF Aided by Locality-Sensitive Hashing

Author: Alonso-Betanzos Amparo
Bahamonde Antonio
Eiras-Franco Carlos
Guijarro-Berdiñas Bertha
Publication venue: 'Wiley'
Publication date: 01/01/2021
Field of study

Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] Feature selection algorithms, such as ReliefF, are very important for processing high-dimensionality data sets. However, widespread use of popular and effective such algorithms is limited by their computational cost. We describe an adaptation of the ReliefF algorithm that simplifies the costliest of its step by approximating the nearest neighbor graph using locality-sensitive hashing (LSH). The resulting ReliefF-LSH algorithm can process data sets that are too large for the original ReliefF, a capability further enhanced by distributed implementation in Apache Spark. Furthermore, ReliefF-LSH obtains better results and is more generally applicable than currently available alternatives to the original ReliefF, as it can handle regression and multiclass data sets. The fact that it does not require any additional hyperparameters with respect to ReliefF also avoids costly tuning. A set of experiments demonstrates the validity of this new approach and confirms its good scalability.This study has been supported in part by the Spanish Ministerio de Economía y Competitividad (projects PID2019-109238GB-C2 and TIN 2015-65069-C2-1-R and 2-R), partially funded by FEDER funds of the EU and by the Xunta de Galicia (projects ED431C 2018/34 and Centro Singular de Investigación de Galicia, accreditation 2016-2019). The authors wish to thank the Fundación Pública Galega Centro Tecnolóxico de Supercomputación de Galicia (CESGA) for the use of their computing resources. Funding for open access charge: Universidade da Coruña/CISUGXunta de Galicia; ED431C 2018/3

Repositorio da Universidade da Coruña

Learning Heterogeneous Similarity Measures for Hybrid-Recommendations in Meta-Mining

Author: Hilario Melanie
Kalousis Alexandros
Nguyen Phong
Wang Jun
Publication venue
Publication date: 04/10/2012
Field of study

The notion of meta-mining has appeared recently and extends the traditional meta-learning in two ways. First it does not learn meta-models that provide support only for the learning algorithm selection task but ones that support the whole data-mining process. In addition it abandons the so called black-box approach to algorithm description followed in meta-learning. Now in addition to the datasets, algorithms also have descriptors, workflows as well. For the latter two these descriptions are semantic, describing properties of the algorithms. With the availability of descriptors both for datasets and data mining workflows the traditional modelling techniques followed in meta-learning, typically based on classification and regression algorithms, are no longer appropriate. Instead we are faced with a problem the nature of which is much more similar to the problems that appear in recommendation systems. The most important meta-mining requirements are that suggestions should use only datasets and workflows descriptors and the cold-start problem, e.g. providing workflow suggestions for new datasets. In this paper we take a different view on the meta-mining modelling problem and treat it as a recommender problem. In order to account for the meta-mining specificities we derive a novel metric-based-learning recommender approach. Our method learns two homogeneous metrics, one in the dataset and one in the workflow space, and a heterogeneous one in the dataset-workflow space. All learned metrics reflect similarities established from the dataset-workflow preference matrix. We demonstrate our method on meta-mining over biological (microarray datasets) problems. The application of our method is not limited to the meta-mining problem, its formulations is general enough so that it can be applied on problems with similar requirements

arXiv.org e-Print Archive

Crossref

RERO DOC Digital Library

Recommended from our members

Robocrystallographer: Automated crystal structure text descriptions and analysis

Author: Ganose AM
Jain A
Publication venue: eScholarship, University of California
Publication date: 01/09/2019
Field of study

Our ability to describe crystal structure features is of crucial importance when attempting to understand structure-property relationships in the solid state. In this paper, the authors introduce robocrystallographer, an open-source toolkit for analyzing crystal structures. This package combines new and existing open-source analysis tools to provide structural information, including the local coordination and polyhedral type, polyhedral connectivity, octahedral tilt angles, component-dimensionality, and molecule-within-crystal and fuzzy prototype identification. Using this information, robocrystallographer can generate text-based descriptions of crystal structures that resemble descriptions written by human crystallographers. The authors use robocrystallographer to investigate the dimensionalities of all compounds in the Materials Project database and highlight its potential in machine learning studies

eScholarship - University of California

Efficient Feature Subset Selection Algorithm for High Dimensional Data

Author: Chormunge Smita
Jena Sudarson
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/08/2016
Field of study

Feature selection approach solves the dimensionality problem by removing irrelevant and redundant features. Existing Feature selection algorithms take more time to obtain feature subset for high dimensional data. This paper proposes a feature selection algorithm based on Information gain measures for high dimensional data termed as IFSA (Information gain based Feature Selection Algorithm) to produce optimal feature subset in efficient time and improve the computational performance of learning algorithms. IFSA algorithm works in two folds: First apply filter on dataset. Second produce the small feature subset by using information gain measure. Extensive experiments are carried out to compare proposed algorithm and other methods with respect to two different classifiers (Naive bayes and IBK) on microarray and text data sets. The results demonstrate that IFSA not only produces the most select feature subset in efficient time but also improves the classifier performance

IAES journal

Crossref

Institute of Advanced Engineering and Science

Optimized jk-nearest neighbor based online signature verification and evaluation of the main parameters

Author: Kovari Bence
Saleem Mohammad
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 23/11/2021
Field of study

In this paper, we propose an enhanced jk-nearest neighbor (jk-NN) classifier for online signature verification. After studying the algorithm's main parameters, we use four separate databases to present and evaluate each algorithm parameter. The results show that the proposed method can increase the verification accuracy by 0.73-10% compared to a traditional one class k-NN classifier. The algorithm has achieved reasonable accuracy for different databases, a 3.93% error rate when using the SVC2004 database, 2.6% for MCYT-100 database, 1.75% for the SigComp'11 database, and 6% for the SigComp'15 database.The proposed algorithm uses specifically chosen parameters and a procedure to pick the optimal value for K using only the signer's reference signatures, to build a practical verification system for real-life scenarios where only these signatures are available. By applying the proposed algorithm, the average error achieved was 8% for SVC2004, 3.26% for MCYT-100, 13% for SigComp'15, and 2.22% for SigComp'11

AGH (Akademia Górniczo-Hutnicza) University of Science and Technology: Journals

Computer Science Journal (AGH University of Science and Technology, Krakow)

Case-based Reasoning Method for Real-time Expert Diagnostics Systems

Author: Eremeev Alexander
Varshavskiy Pavel
Publication venue: Institute of Information Theories and Applications FOI ITHEA
Publication date: 01/01/2008
Field of study

The method of case-based reasoning for a solution of problems of real-time diagnostics and forecasting in intelligent decision support systems (IDSS) is considered. Special attention is drawn to case library structure for real-time IDSS (RT IDSS) and algorithm of k-nearest neighbors type. This work was supported by RFBR

Bulgarian Digital Mathematics Library at IMI-BAS