5 research outputs found
Imputación de datos incompletos y clasificación de patrones mediante aprendizaje multitarea
Almost all research on supervised learning is based
on the assumption that training data are completely observable,
but it is not a common situation because real world databases are
rarely complete. The ability of handling missing data has become
a fundamental requirement for machine learning. Up to now,
proposed methods consider the problem as two separated tasks,
main task and imputation task, and solve them separately (Single
Task Learning, STL). In this paper, a new effective method is
proposed to handle missing features in incomplete databases with
Multitask Learning (MTL). This approach uses the imputation
task as extra task and learning in parallel with the main task.
Thus, imputation is guided and oriented by the learning process,
i.e., imputed values are those that contribute to improve the
learning. In this paper we use the advantages of MTL to handling
missing data and analyze its robustness for handling different
missing variables in real an artificial data sets.Este trabajo está parcialmente financiado por el Ministerio
de Educación y Ciencia a través del proyecto TIC2002-03033
Spatial Classification With Limited Observations Based On Physics-Aware Structural Constraint
Spatial classification with limited feature observations has been a
challenging problem in machine learning. The problem exists in applications
where only a subset of sensors are deployed at certain spots or partial
responses are collected in field surveys. Existing research mostly focuses on
addressing incomplete or missing data, e.g., data cleaning and imputation,
classification models that allow for missing feature values or model missing
features as hidden variables in the EM algorithm. These methods, however,
assume that incomplete feature observations only happen on a small subset of
samples, and thus cannot solve problems where the vast majority of samples have
missing feature observations. To address this issue, we recently proposed a new
approach that incorporates physics-aware structural constraint into the model
representation. Our approach assumes that a spatial contextual feature is
observed for all sample locations and establishes spatial structural constraint
from the underlying spatial contextual feature map. We design efficient
algorithms for model parameter learning and class inference. This paper extends
our recent approach by allowing feature values of samples in each class to
follow a multi-modal distribution. We propose learning algorithms for the
extended model with multi-modal distribution. Evaluations on real-world
hydrological applications show that our approach significantly outperforms
baseline methods in classification accuracy, and the multi-modal extension is
more robust than our early single-modal version especially when feature
distribution in training samples is multi-modal. Computational experiments show
that the proposed solution is computationally efficient on large datasets
Decomposition methods for machine learning with small, incomplete or noisy datasets
In many machine learning applications, measurements are sometimes incomplete or noisy resulting in missing features. In other cases, and for different reasons, the datasets are originally small, and therefore, more data samples are required to derive useful supervised or unsupervised classification methods. Correct handling of incomplete, noisy or small datasets in machine learning is a fundamental and classic challenge. In this article, we provide a unified review of recently proposed methods based on signal decomposition for missing features imputation (data completion), classification of noisy samples and artificial generation of new data samples (data augmentation). We illustrate the application of these signal decomposition methods in diverse selected practical machine learning examples including: brain computer interface, epileptic intracranial electroencephalogram signals classification, face recognition/verification and water networks data analysis. We show that a signal decomposition approach can provide valuable tools to improve machine learning performance with low quality datasets.Fil: Caiafa, César Federico. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Argentino de Radioastronomía. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Argentino de Radioastronomía; ArgentinaFil: Sole Casals, Jordi. Center for Advanced Intelligence; JapónFil: Marti Puig, Pere. University of Catalonia; EspañaFil: Sun, Zhe. RIKEN; JapónFil: Tanaka,Toshihisa. Tokyo University of Agriculture and Technology; Japó
Fault diagnosis of chemical processes with incomplete observations: A comparative study
An important problem to be addressed by diagnostic systems in industrial applications is the estimation of faults with incomplete observations. This work discusses different approaches for handling missing data, and performance of data-driven fault diagnosis schemes. An exploiting classifier and combined methods were assessed in Tennessee-Eastman process, for which diverse incomplete observations were produced. The use of several indicators revealed the trade-off between performances of the different schemes. Support vector machines (SVM) and C4.5, combined with k-nearest neighbourhood (kNN), produce the highest robustness and accuracy, respectively. Bayesian networks (BN) and centroid appear as inappropriate options in terms of accuracy, while Gaussian naive Bayes (GNB) is sensitive to imputation values. In addition, feature selection was explored for further performance enhancement, and the proposed contribution index showed promising results. Finally, an industrial case was studied to assess informative level of incomplete data in terms of the redundancy ratio and generalize the discussion. (C) 2015 Elsevier Ltd. All rights reserved.Peer ReviewedPostprint (author's final draft
Decomposition Methods for Machine Learning with Small, Incomplete or Noisy Datasets
In many machine learning applications, measurements are sometimes incomplete or noisy resulting in missing features. In other cases, and for different reasons, the datasets are originally small, and therefore, more data samples are required to derive useful supervised or unsupervised classification methods. Correct handling of incomplete, noisy or small datasets in machine learning is a fundamental and classic challenge. In this article, we provide a unified review of recently proposed methods based on signal decomposition for missing features imputation (data completion), classification of noisy samples and artificial generation of new data samples (data augmentation). We illustrate the application of these signal decomposition methods in diverse selected practical machine learning examples including: brain computer interface, epileptic intracranial electroencephalogram signals classification, face recognition/verification and water networks data analysis. We show that a signal decomposition approach can provide valuable tools to improve machine learning performance with low quality datasets.Instituto Argentino de Radioastronomí