Search CORE

13 research outputs found

k-nearest neighbors prediction and classification for spatial data

Author: Ahmed Mohamed-Salem
Attouch Mohammed Kadi
Dabo-Niang Sophie
N'diaye Mamadou
Publication venue
Publication date: 01/06/2018
Field of study

We propose a nonparametric predictor and a supervised classification based on the regression function estimate of a spatial real variable using k-nearest neighbors method (k-NN). Under some assumptions, we establish almost complete or sure convergence of the proposed estimates which incorporate a spatial proximity between observations. Numerical results on simulated and real fish data illustrate the behavior of the given predictor and classification method

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks

Author: Blase Jennifer
Chu Xu
Li Peng
Rao Xi
Zhang Ce
Zhang Yue
Publication venue
Publication date: 01/01/2020
Field of study

Data quality affects machine learning (ML) model performances, and data scientists spend considerable amount of time on data cleaning before model training. However, to date, there does not exist a rigorous study on how exactly cleaning affects ML -- ML community usually focuses on developing ML algorithms that are robust to some particular noise types of certain distributions, while database (DB) community has been mostly studying the problem of data cleaning alone without considering how data is consumed by downstream ML analytics. We propose a CleanML study that systematically investigates the impact of data cleaning on ML classification tasks. The open-source and extensible CleanML study currently includes 14 real-world datasets with real errors, five common error types, seven different ML models, and multiple cleaning algorithms for each error type (including both commonly used algorithms in practice as well as state-of-the-art solutions in academic literature). We control the randomness in ML experiments using statistical hypothesis testing, and we also control false discovery rate in our experiments using the Benjamini-Yekutieli (BY) procedure. We analyze the results in a systematic way to derive many interesting and nontrivial observations. We also put forward multiple research directions for researchers.Comment: published in ICDE 202

arXiv.org e-Print Archive

Repository for Publications and Research Data

An Intelligent Approach Using Machine Learning Techniques to Predict Flow in People

Author: Baca Ruiz Luis Gonzaga
Pegalajar Jiménez María Del Carmen
Publication venue: MDPI
Publication date: 04/04/2023
Field of study

The goal of this study is to estimate the state of consciousness known as Flow, which is associated with an optimal experience and can indicate a person’s efficiency in both personal and professional settings. To predict Flow, we employ artificial intelligence techniques using a set of variables not directly connected with its construct. We analyse a significant amount of data from psychological tests that measure various personality traits. Data mining techniques support conclusions drawn from the psychological study. We apply linear regression, regression tree, random forest, support vector machine, and artificial neural networks. The results show that the multilayer perceptron network is the best estimator, with an MSE of 0.007122 and an accuracy of 88.58%. Our approach offers a novel perspective on the relationship between personality and the state of consciousness known as Flow

Repositorio Institucional Universidad de Granada

Trends in Nearest Feature Classification for Face RecognitionAchievements and Perspectives

Author: C&#233
Mauricio Orozco-Alzate
Publication venue: 'IntechOpen'
Publication date: 01/01/2009
Field of study

IntechOpen

Modèles à noyaux à structure locale

Author: Vincent Pascal
Publication venue
Publication date: 01/01/2003
Field of study

Thèse numérisée par la Direction des bibliothèques de l'Université de Montréal

Dépôt Institutionnel Numérique