Search CORE

2 research outputs found

Improving large-scale k-nearest neighbor text categorization with label autoencoders

Author: Cao Shuyuan
Darriba Bilbao Victor Manuel
Ribadas Pena Francisco Jose
Publication venue: 'MDPI AG'
Publication date: 01/08/2022
Field of study

In this paper, we introduce a multi-label lazy learning approach to deal with automatic semantic indexing in large document collections in the presence of complex and structured label vocabularies with high inter-label correlation. The proposed method is an evolution of the traditional k-Nearest Neighbors algorithm which uses a large autoencoder trained to map the large label space to a reduced size latent space and to regenerate the predicted labels from this latent space. We have evaluated our proposal in a large portion of the MEDLINE biomedical document collection which uses the Medical Subject Headings (MeSH) thesaurus as a controlled vocabulary. In our experiments we propose and evaluate several document representation approaches and different label autoencoder configurations.Ministerio de Ciencia e Innovación | Ref. PID2020-113230RB-C2

Investigo

Directory of Open Access Journals