13 research outputs found
k-nearest neighbors prediction and classification for spatial data
We propose a nonparametric predictor and a supervised classification based on
the regression function estimate of a spatial real variable using k-nearest
neighbors method (k-NN). Under some assumptions, we establish almost complete
or sure convergence of the proposed estimates which incorporate a spatial
proximity between observations. Numerical results on simulated and real fish
data illustrate the behavior of the given predictor and classification method
CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks
Data quality affects machine learning (ML) model performances, and data
scientists spend considerable amount of time on data cleaning before model
training. However, to date, there does not exist a rigorous study on how
exactly cleaning affects ML -- ML community usually focuses on developing ML
algorithms that are robust to some particular noise types of certain
distributions, while database (DB) community has been mostly studying the
problem of data cleaning alone without considering how data is consumed by
downstream ML analytics. We propose a CleanML study that systematically
investigates the impact of data cleaning on ML classification tasks. The
open-source and extensible CleanML study currently includes 14 real-world
datasets with real errors, five common error types, seven different ML models,
and multiple cleaning algorithms for each error type (including both commonly
used algorithms in practice as well as state-of-the-art solutions in academic
literature). We control the randomness in ML experiments using statistical
hypothesis testing, and we also control false discovery rate in our experiments
using the Benjamini-Yekutieli (BY) procedure. We analyze the results in a
systematic way to derive many interesting and nontrivial observations. We also
put forward multiple research directions for researchers.Comment: published in ICDE 202
An Intelligent Approach Using Machine Learning Techniques to Predict Flow in People
The goal of this study is to estimate the state of consciousness known as Flow, which
is associated with an optimal experience and can indicate a person’s efficiency in both personal
and professional settings. To predict Flow, we employ artificial intelligence techniques using a
set of variables not directly connected with its construct. We analyse a significant amount of data
from psychological tests that measure various personality traits. Data mining techniques support
conclusions drawn from the psychological study. We apply linear regression, regression tree, random
forest, support vector machine, and artificial neural networks. The results show that the multilayer
perceptron network is the best estimator, with an MSE of 0.007122 and an accuracy of 88.58%.
Our approach offers a novel perspective on the relationship between personality and the state of
consciousness known as Flow
Modèles à noyaux à structure locale
Thèse numérisée par la Direction des bibliothèques de l'Université de Montréal