3,895 research outputs found
A survey of cost-sensitive decision tree induction algorithms
The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field
Toward Optimal Feature Selection in Naive Bayes for Text Categorization
Automated feature selection is important for text categorization to reduce
the feature size and to speed up the learning process of classifiers. In this
paper, we present a novel and efficient feature selection framework based on
the Information Theory, which aims to rank the features with their
discriminative capacity for classification. We first revisit two information
measures: Kullback-Leibler divergence and Jeffreys divergence for binary
hypothesis testing, and analyze their asymptotic properties relating to type I
and type II errors of a Bayesian classifier. We then introduce a new divergence
measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure
multi-distribution divergence for multi-class classification. Based on the
JMH-divergence, we develop two efficient feature selection methods, termed
maximum discrimination () and methods, for text categorization.
The promising results of extensive experiments demonstrate the effectiveness of
the proposed approaches.Comment: This paper has been submitted to the IEEE Trans. Knowledge and Data
Engineering. 14 pages, 5 figure
Restricting Supervised Learning: Feature Selection and Feature Space Partition
Many supervised learning problems are considered difficult to solve either because of the redundant features or because of the structural complexity of the generative function. Redundant features increase the learning noise and therefore decrease the prediction performance. Additionally, a number of problems in various applications such as bioinformatics or image processing, whose data are sampled in a high dimensional space, suffer the curse of dimensionality, and there are not enough observations to obtain good estimates. Therefore, it is necessary to reduce such features under consideration. Another issue of supervised learning is caused by the complexity of an unknown generative model. To obtain a low variance predictor, linear or other simple functions are normally suggested, but they usually result in high bias. Hence, a possible solution is to partition the feature space into multiple non-overlapping regions such that each region is simple enough to be classified easily. In this dissertation, we proposed several novel techniques for restricting supervised learning problems with respect to either feature selection or feature space partition. Among different feature selection methods, 1-norm regularization is advocated by many researchers because it incorporates feature selection as part of the learning process. We give special focus here on ranking problems because very little work has been done for ranking using L1 penalty. We present here a 1-norm support vector machine method to simultaneously find a linear ranking function and to perform feature subset selection in ranking problems. Additionally, because ranking is formulated as a classification task when pair-wise data are considered, it increases the computational complexity from linear to quadratic in terms of sample size. We also propose a convex hull reduction method to reduce this impact. The method was tested on one artificial data set and two benchmark real data sets, concrete compressive strength set and Abalone data set. Theoretically, by tuning the trade-off parameter between the 1-norm penalty and the empirical error, any desired size of feature subset could be achieved, but computing the whole solution path in terms of the trade-off parameter is extremely difficult. Therefore, using 1-norm regularization alone may not end up with a feature subset of small size. We propose a recursive feature selection method based on 1-norm regularization which can handle the multi-class setting effectively and efficiently. The selection is performed iteratively. In each iteration, a linear multi-class classifier is trained using 1-norm regularization, which leads to sparse weight vectors, i.e., many feature weights are exactly zero. Those zero-weight features are eliminated in the next iteration. The selection process has a fast rate of convergence. We tested our method on an earthworm microarray data set and the empirical results demonstrate that the selected features (genes) have very competitive discriminative power. Feature space partition separates a complex learning problem into multiple non-overlapping simple sub-problems. It is normally implemented in a hierarchical fashion. Different from decision tree, a leaf node of this hierarchical structure does not represent a single decision, but represents a region (sub-problem) that is solvable with respect to linear functions or other simple functions. In our work, we incorporate domain knowledge in the feature space partition process. We consider domain information encoded by discrete or categorical attributes. A discrete or categorical attribute provides a natural partition of the problem domain, and hence divides the original problem into several non-overlapping sub-problems. In this sense, the domain information is useful if the partition simplifies the learning task. However it is not trivial to select the discrete or categorical attribute that maximally simplify the learning task. A naive approach exhaustively searches all the possible restructured problems. It is computationally prohibitive when the number of discrete or categorical attributes is large. We describe a metric to rank attributes according to their potential to reduce the uncertainty of a classification task. It is quantified as a conditional entropy achieved using a set of optimal classifiers, each of which is built for a sub-problem defined by the attribute under consideration. To avoid high computational cost, we approximate the solution by the expected minimum conditional entropy with respect to random projections. This approach was tested on three artificial data sets, three cheminformatics data sets, and two leukemia gene expression data sets. Empirical results demonstrate that our method is capable of selecting a proper discrete or categorical attribute to simplify the problem, i.e., the performance of the classifier built for the restructured problem always beats that of the original problem. Restricting supervised learning is always about building simple learning functions using a limited number of features. Top Selected Pair (TSP) method builds simple classifiers based on very few (for example, two) features with simple arithmetic calculation. However, traditional TSP method only deals with static data. In this dissertation, we propose classification methods for time series data that only depend on a few pairs of features. Based on the different comparison strategies, we developed the following approaches: TSP based on average, TSP based on trend, and TSP based on trend and absolute difference amount. In addition, inspired by the idea of using two features, we propose a time series classification method based on few feature pairs using dynamic time warping and nearest neighbor
Markov modelling on human activity recognition
Human Activity Recognition (HAR) is a research topic with a relevant interest
in the machine learning community. Understanding the activities that a person
is performing and the context where they perform them has a huge importance
in multiple applications, including medical research, security or patient monitoring.
The improvement of the smart-phones and inertial sensors technologies has
lead to the implementation of activity recognition systems based on these devices,
either by themselves or combining their information with other sensors. Since
humans perform their daily activities sequentially in a specific order, there exist
some temporal information in the physical activities that characterize the different
human behaviour patterns. However, the most popular approach in HAR is to assume
that the data is conditionally independent, segmenting the data in different
windows and extracting the most relevant features from each segment.
In this thesis we employ the temporal information explicitly, where the raw data
provided by the wearable sensors is fed to the training models. Thus, we study
how to perform a Markov modelling implementation of a long-term monitoring
HAR system with wearable sensors, and we address the existing open problems
arising while processing and training the data, combining different sensors and
performing the long-term monitoring with battery powered devices.
Employing directly the signals from the sensors to perform the recognition can
lead to problems due to misplacements of the sensors on the body. We propose an
orientation correction algorithm based on quaternions to process the signals and
find a common frame reference for all of them independently on the position of the
sensors or their orientation. This algorithm allows for a better activity recognition
when feed to the classification algorithm when compared with similar approaches,
and the quaternion transformations allow for a faster implementation.
One of the most popular algorithms to model time series data are Hidden
Markov Models (HMMs) and the training of the parameters of the model is performed
using the Baum-Welch algorithm. However, this algorithm converges to
local maxima and the multiple initializations needed to avoid them makes it computationally expensive for large datasets. We propose employing the theory of
spectral learning to develop a discriminative HMM that avoids the problems of
the Baum-Welch algorithm, outperforming it in both complexity and computational
cost.
When we implement a HAR system with several sensors, we need to consider
how to perform the combination of the information provided by them. Data fusion
can be performed either at signal level or at classification level. When performed
at classification level, the usual approach is to combine the decisions of multiple
classifiers on the body to obtain the performed activities. However, in the simple
case with two classifiers, which can be a practical implementation of a HAR
system, the combination reduces to selecting the most discriminative sensor, and
no performance improvement is obtained against the single sensor implementation.
In this thesis, we propose to employ the soft-outputs of the classifiers in
the combination and we develop a method that considers the Markovian structure
of the ground truth to capture the dynamics of the activities. We will show
that this method improves the recognition of the activities with respect to other
combination methods and with respect to the signal fusion case.
Finally, in long-term monitoring HAR systems with wearable sensors we need
to address the energy efficiency problem that is inherent to battery powered devices.
The most common approach to improve the energy efficiency of such devices
is to reduce the amount of data acquired by the wearable sensors. In that sense,
we introduce a general framework for the energy efficiency of a system with multiple
sensors under several energy restrictions. We propose a sensing strategy to
optimize the temporal data acquisition based on computing the uncertainty of
the activities given the data and adapt the acquisition actively. Furthermore, we
develop a sensor selection algorithm based on Bayesian Experimental Design to
obtain the best configuration of sensors that performs the activity recognition accurately, allowing for a further improvement on the energy efficiency by limiting
the number of sensors employed in the acquisition.El reconocimiento de actividades humanas (HAR) es un tema de investigación
con una gran relevancia para la comunidad de aprendizaje máquina. Comprender
las actividades que una persona está realizando y el contexto en el que las
realiza es de gran importancia en multitud de aplicaciones, entre las que se incluyen
investigación médica, seguridad o monitorización de pacientes. La mejora
en los smart-phones y en las tecnologías de sensores inerciales han dado lugar a
la implementación de sistemas de reconocimiento de actividades basado en dichos
dispositivos, ya sea por si mismos o combinándolos con otro tipo de sensores. Ya
que los seres humanos realizan sus actividades diarias de manera secuencial en un
orden específico, existe una cierta información temporal en las actividades físicas
que caracterizan los diferentes patrones de comportamiento, Sin embargo, los algoritmos
más comunes asumen que los datos son condicionalmente independientes,
segmentándolos en diferentes ventanas y extrayendo las características más relevantes
de cada segmento.
En esta tesis utilizamos la información temporal de manera explícita, usando
los datos crudos de los sensores como entrada de los modelos de entrenamiento. Por
ello, analizamos como implementar modelos Markovianos para el reconocimiento
de actividades en monitorizaciones de larga duración con sensores wearable, y
tratamos los problemas existentes al procesar y entrenar los datos, al combinar
diferentes sensores y al realizar adquisiciones de larga duración con dispositivos
alimentados por baterías.
Emplear directamente las señales de los sensores para realizar el reconocimiento
de actividades puede dar lugar a problemas debido a la incorrecta colocación de
los sensores en el cuerpo. Proponemos un algoritmo de corrección de la orientación
basado en quaterniones para procesar las señales y encontrar un marco de referencia
común independiente de la posición de los sensores y su orientación. Este
algoritmo permite obtener un mejor reconocimiento de actividades al emplearlo
en conjunto con un algoritmo de clasificación, cuando se compara con modelos similares. Además, la transformación de la orientación basada en quaterniones da
lugar a una implementación más rápida.
Uno de los algoritmos más populares para modelar series temporales son los
modelos ocultos de Markov, donde los parámetros del modelo se entrenan usando
el algoritmo de Baum-Welch. Sin embargo, este algoritmo converge en general
a máximos locales, y las múltiples inicializaciones que se necesitan en su implementación lo convierten en un algoritmo de gran carga computacional cuando se
emplea con bases de datos de un volumen considerable. Proponemos emplear la
teoría de aprendizaje espectral para desarrollar un HMM discriminativo que evita
los problemas del algoritmo de Baum-Welch, superándolo tanto en complejidad
como en coste computacional. Cuando se implementa un sistema de reconocimiento de actividades con múltiples
sensores, necesitamos considerar cómo realizar la combinación de la información que proporcionan. La fusión de los datos, se puede realizar tanto a nivel
de señal como a nivel de clasificación. Cuando se realiza a nivel de clasificación, lo
normal es combinar las decisiones de múltiples clasificadores colocados en el cuerpo
para obtener las actividades que se están realizando. Sin embargo, en un caso simple
donde únicamente se emplean dos sensores, que podría ser una implantación
habitual de un sistema de reconocimiento de actividades, la combinación se reduce
a seleccionar el sensor más discriminativo, y no se obtiene mejora con respecto a
emplear un único sensor. En esta tesis proponemos emplear salidas blandas de
los clasificadores para la combinación, desarrollando un modelo que considera la
estructura Markoviana de los datos reales para capturar la dinámica de las actividades.
Mostraremos como este método mejora el reconocimiento de actividades
con respecto a otros métodos de combinación de clasificadores y con respecto a la
fusión de los datos a nivel de señal.
Por último, abordamos el problema de la eficiencia energética de dispositivos
alimentados por baterías en sistemas de reconocimiento de actividades de larga
duración. La aproximación más habitual para mejorar la eficiencia energética consiste
en reducir el volumen de datos que adquieren los sensores. En ese sentido, introducimos un marco general para tratar el problema de la eficiencia energética
en un sistema con múltiples sensores bajo ciertas restricciones de energética. Proponemos
una estrategia de adquisición activa para optimizar el sistema temporal
de recogida de datos, basándonos en la incertidumbre de las actividades dados los
datos que conocemos. Además, desarrollamos un algoritmo de selección de sensores
basado diseño experimental Bayesiano y así obtener la mejor configuración
para realizar el reconocimiento de actividades limitando el número de sensores
empleados y al mismo tiempo reduciendo su consumo energético.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: Luis Ignacio Santamaría Caballero.- Secretario: Pablo Martínez Olmos.- Vocal: Alberto Suárez Gonzále
Information and Decision Theoretic Approaches to Problems in Active Diagnosis.
In applications such as active learning or disease/fault diagnosis, one often encounters the problem of identifying an unknown object while minimizing the number of ``yes" or ``no" questions (queries) posed about that object. This problem has been commonly referred to as object/entity identification or active diagnosis in the literature. In this thesis, we consider several extensions of this fundamental problem that are motivated by practical considerations in real-world, time-critical identification tasks such as emergency response.
First, we consider the problem where the objects are partitioned into groups, and the goal is to identify only the group to which the object belongs. We then consider the case where the cost of identifying an object grows exponentially in the number of queries. To address these problems we show that a standard algorithm for object identification, known as the splitting algorithm or generalized binary search (GBS), may be viewed as a generalization of Shannon-Fano coding. We then extend this result to the group-based and the exponential cost settings, leading to new, improved algorithms.
We then study the problem of active diagnosis under persistent query noise. Previous work in this area either assumed that the noise is independent or that the underlying query noise distribution is completely known. We make no such assumptions, and introduce an algorithm that returns a ranked list of objects, such that the expected rank of the true object is optimized. Finally, we study the problem of active diagnosis where multiple objects are present, such as in disease/fault diagnosis. Current algorithms in this area have an exponential time complexity making them slow and intractable. We address this issue by proposing an extension of our rank-based approach to the multiple object scenario, where we optimize the area under the ROC curve of the rank-based output. The AUC criterion allows us to make a simplifying assumption that significantly reduces the complexity of active diagnosis (from exponential to near quadratic), with little or no compromise on the performance quality. Further, we demonstrate the performance of the proposed algorithms through extensive experiments on both synthetic and real world datasets.Ph.D.Electrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91606/1/gowtham_1.pd
- …