6 research outputs found
Variable selection in classification for multivariate functional data
When classification methods are applied to high-dimensional data, selecting a subset of the predictors may lead to an improvement in the predictive ability of the estimated model, in addition to reducing the model complexity. In Functional Data Analysis (FDA), i.e., when data are functions, selecting a subset of predictors corresponds to selecting a subset of individual time instants in the time interval in which the functional data are measured. In this paper, we address the problem of selecting the most informative time instants in multivariate functional data, a case much less studied than its single-variate counterpart. Our proposal allows one to use in a very simple way high-order information of the data, e.g. monotonicity or convexity by means of the functional data derivatives. The aforementioned problem is addressed with tools of Global Optimization in continuous variables: the time instants are selected to maximize the correlation between the class label and the Support Vector Machine score used for classification. The effectiveness of the proposal is shown in univariate and multivariate datasets
Aplicación de técnicas de aprendizaje máquina para la caracterización y clasificación de pacientes con trastorno obsesivo compulsivo
El siguiente Trabajo Fin de Grado se basa en el cada vez más habitual empleo de métodos de aprendizaje máquina con el fin de clasificar y caracterizar trastornos psiquiátricos. Concretamente, el sistema diseñado pretende acercarse al diagnóstico de TOC (‘Trastorno Obsesivo Compulsivo’) a través del análisis de imágenes de resonancia magnética (MRI). El sistema diseñado tiene como objetivo plantear un algoritmo capaz de diagnosticar pacientes con TOC y, principalmente, capaz de caracterizar la enfermedad, detectando de manera automática las regiones neuroanatómicas relacionadas con el trastorno. Para ello, se empleará una arquitectura modular creada a partir de dos premisas fundamentales. 1. Análisis por áreas funcionales y/o neuroanatómicas. Cada imagen de resonancia magnética se divide en, aproximadamente, una centena de subconjuntos compuestos por vóxeles asociados a un área funcional o región neuroanatómica del cerebro. Así pues, el objetivo es aplicar un clasificador que facilite la selección de los conjuntos de vóxeles relevantes para la detección de la enfermedad. 2. Caracterización y fusión de áreas funcionales. El sistema utilizará métodos de selección de características sobre las salidas de los clasificadores el objetivo de obtener una selección automática de las áreas relevantes para el diagnóstico de la patología que estamos tratando. Asimismo, el último paso será el estudio de la relación que tienen las áreas entre sí mediante el uso de clasificadores, tanto lineales como no lineales. Una vez desarrollado y aplicado el algoritmo, se aprovecharán los resultados tanto para comparar la clasificación de pacientes con los resultados previos obtenidos mediante métodos tradicionales [1], [2], como para analizar el patrón de áreas neuroanatómicas responsables del trastorno. -------------------------------------------------------This work is based on increasingly common use of machine learning methods in
order to classify and characterize psychiatric disorders. Specifically, the designed
system tries to be able to diagnose OCD (Obsessive-Compulsive Disorder) though the
MRI (Magnetic Resonance Imaging) analysis.
The main system's goal is to construct an algorithm able to detect OCD patients
and characterize the disease, detecting automatically neuroanatomical regions related
to the disorder, supported on a modular arquitecture process with two fundamental
principles.
1. Analysis of functional and/or neuroanatomical areas. Each MRI is divided into
one hundred subsets composed of voxels associated to a functional area.
Thus, the goal is to apply a classifier which facilitates the selection of the
relevant voxels sets for the diagnosis of the disease.
2. Characterization and combination of functional areas. The system will use
feature selection methods with the outputs of the first classifiers in order to
get an automatic selection of the relevant areas for diagnosis of the
pathology. The last step will use linear and no liner classifiers to analyze
whether the different areas are interrelated.
Having the algorithm developed, we will use the results to compare the classifications of patients with previous results got by traditional methods [1], [2], and
to analyze the pattern of neuroanatomical areas responsible for the disorder.Ingeniería de Sistemas Audiovisuale
Izbor atributa integracijom znanja o domenu primenom metoda odlučivanja kod prediktivnog modelovanja vremenskih serija nadgledanim mašinskim učenjem
The aim of the research presented within this doctoral dissertation is
to develop a feature selection methodology through integrating
domain-specific knowledge by applying mathematical methods of
decision-making, to improve the feature selection process and the
precision of supervised machine learning methods for predictive
modeling of time series.
To integrate domain-specific knowledge, a multi-criteria decision
making method is used, i.e. an analytical hierarchical process proven
to be successful in numerous studies carried out to date. This
approach was selected because it allows the selection of a set of
factors based on their relevance, even in the case of mutually opposite
criteria.
In predicting the movement of time series, the possibility of
integrating feature relevance into support vector machines to improve
their prediction accuracy was studied.
The proposed methodology was applied as a feature-selection method
for the predictive modelling of movement of financial time series.
Unlike existing approaches, where the feature selection method is
based on a quantitative analysis of the input values, the proposed
methodology carries out a qualitative evaluation of the attributes in
relation to the prediction domain and represents a means of
integrating a priori knowledge of the prediction domain
Information-theoretic feature selection for functional data classification
The classification of functional or high-dimensional data requires to select a reduced subset of features among the initial set, both to help fighting the curse of dimensionality and to help interpreting the problem and the model. The mutual information criterion may be used in that context, but it suffers from the difficulty of its estimation through a finite set of samples. Efficient estimators are not designed specifically to be applied in a classification context, and thus suffer from further drawbacks and difficulties. This paper presents an estimator of mutual information that is specifically designed for classification tasks, including multi-class ones. It is combined to a recently published stopping criterion in a traditional forward feature selection procedure. Experiments on both traditional benchmarks and on an industrial functional classification problem show the added value of this estimator. (C) 2009 Elsevier B.V. All rights reserved