91 research outputs found
Discriminative learning of Bayesian networks via factorized conditional log-likelihood
We propose an efficient and parameter-free scoring criterion, the factorized conditional log-likelihood (ˆfCLL), for learning Bayesian network classifiers. The proposed score is an approximation of the conditional log-likelihood criterion. The approximation is devised in order to guarantee decomposability over the network structure, as well as efficient estimation of the optimal parameters, achieving the same time and space complexity as the traditional log-likelihood scoring criterion. The resulting criterion has an information-theoretic interpretation based on interaction information, which exhibits its discriminative nature. To evaluate the performance of the proposed criterion, we present an empirical comparison with state-of-the-art classifiers. Results on a large suite of benchmark data sets from the UCI repository show that ˆfCLL-trained classifiers achieve at least as good accuracy as the best compared classifiers, using significantly less computational resources.Peer reviewe
Approaching Sentiment Analysis by Using Semi-supervised Learning of Multidimensional Classifiers
Sentiment Analysis is defined as the computational study of opinions, sentiments and emotions
expressed in text. Within this broad field, most of the work has been focused on either Sentiment
Polarity classification, where a text is classified as having positive or negative sentiment,
or Subjectivity classification, in which a text is classified as being subjective or objective. However,
in this paper, we consider instead a real-world problem in which the attitude of the author
is characterised by three different (but related) target variables: Subjectivity, Sentiment Polarity,
Will to Influence, unlike the two previously stated problems, where there is only a single variable
to be predicted. For that reason, the (uni-dimensional) common approaches used in this area
yield suboptimal solutions to this problem. In order to bridge this gap, we propose, for the first
time, the use of the novel multi-dimensional classification paradigm in the Sentiment Analysis
domain. This methodology is able to join the different target variables in the same classification
task so as to take advantage of the potential statistical relations between them. In addition, and
in order to take advantage of the huge amount of unlabelled information available nowadays in
this context, we propose the extension of the multi-dimensional classification framework to the
semi-supervised domain. Experimental results for this problem show that our semi-supervised
multi-dimensional approach outperforms the most common Sentiment Analysis approaches, concluding
that our approach is beneficial to improve the recognition rates for this problem, and in
extension, could be considered to solve future Sentiment Analysis problems
Markov modelling on human activity recognition
Human Activity Recognition (HAR) is a research topic with a relevant interest
in the machine learning community. Understanding the activities that a person
is performing and the context where they perform them has a huge importance
in multiple applications, including medical research, security or patient monitoring.
The improvement of the smart-phones and inertial sensors technologies has
lead to the implementation of activity recognition systems based on these devices,
either by themselves or combining their information with other sensors. Since
humans perform their daily activities sequentially in a specific order, there exist
some temporal information in the physical activities that characterize the different
human behaviour patterns. However, the most popular approach in HAR is to assume
that the data is conditionally independent, segmenting the data in different
windows and extracting the most relevant features from each segment.
In this thesis we employ the temporal information explicitly, where the raw data
provided by the wearable sensors is fed to the training models. Thus, we study
how to perform a Markov modelling implementation of a long-term monitoring
HAR system with wearable sensors, and we address the existing open problems
arising while processing and training the data, combining different sensors and
performing the long-term monitoring with battery powered devices.
Employing directly the signals from the sensors to perform the recognition can
lead to problems due to misplacements of the sensors on the body. We propose an
orientation correction algorithm based on quaternions to process the signals and
find a common frame reference for all of them independently on the position of the
sensors or their orientation. This algorithm allows for a better activity recognition
when feed to the classification algorithm when compared with similar approaches,
and the quaternion transformations allow for a faster implementation.
One of the most popular algorithms to model time series data are Hidden
Markov Models (HMMs) and the training of the parameters of the model is performed
using the Baum-Welch algorithm. However, this algorithm converges to
local maxima and the multiple initializations needed to avoid them makes it computationally expensive for large datasets. We propose employing the theory of
spectral learning to develop a discriminative HMM that avoids the problems of
the Baum-Welch algorithm, outperforming it in both complexity and computational
cost.
When we implement a HAR system with several sensors, we need to consider
how to perform the combination of the information provided by them. Data fusion
can be performed either at signal level or at classification level. When performed
at classification level, the usual approach is to combine the decisions of multiple
classifiers on the body to obtain the performed activities. However, in the simple
case with two classifiers, which can be a practical implementation of a HAR
system, the combination reduces to selecting the most discriminative sensor, and
no performance improvement is obtained against the single sensor implementation.
In this thesis, we propose to employ the soft-outputs of the classifiers in
the combination and we develop a method that considers the Markovian structure
of the ground truth to capture the dynamics of the activities. We will show
that this method improves the recognition of the activities with respect to other
combination methods and with respect to the signal fusion case.
Finally, in long-term monitoring HAR systems with wearable sensors we need
to address the energy efficiency problem that is inherent to battery powered devices.
The most common approach to improve the energy efficiency of such devices
is to reduce the amount of data acquired by the wearable sensors. In that sense,
we introduce a general framework for the energy efficiency of a system with multiple
sensors under several energy restrictions. We propose a sensing strategy to
optimize the temporal data acquisition based on computing the uncertainty of
the activities given the data and adapt the acquisition actively. Furthermore, we
develop a sensor selection algorithm based on Bayesian Experimental Design to
obtain the best configuration of sensors that performs the activity recognition accurately, allowing for a further improvement on the energy efficiency by limiting
the number of sensors employed in the acquisition.El reconocimiento de actividades humanas (HAR) es un tema de investigación
con una gran relevancia para la comunidad de aprendizaje máquina. Comprender
las actividades que una persona está realizando y el contexto en el que las
realiza es de gran importancia en multitud de aplicaciones, entre las que se incluyen
investigación médica, seguridad o monitorización de pacientes. La mejora
en los smart-phones y en las tecnologías de sensores inerciales han dado lugar a
la implementación de sistemas de reconocimiento de actividades basado en dichos
dispositivos, ya sea por si mismos o combinándolos con otro tipo de sensores. Ya
que los seres humanos realizan sus actividades diarias de manera secuencial en un
orden específico, existe una cierta información temporal en las actividades físicas
que caracterizan los diferentes patrones de comportamiento, Sin embargo, los algoritmos
más comunes asumen que los datos son condicionalmente independientes,
segmentándolos en diferentes ventanas y extrayendo las características más relevantes
de cada segmento.
En esta tesis utilizamos la información temporal de manera explícita, usando
los datos crudos de los sensores como entrada de los modelos de entrenamiento. Por
ello, analizamos como implementar modelos Markovianos para el reconocimiento
de actividades en monitorizaciones de larga duración con sensores wearable, y
tratamos los problemas existentes al procesar y entrenar los datos, al combinar
diferentes sensores y al realizar adquisiciones de larga duración con dispositivos
alimentados por baterías.
Emplear directamente las señales de los sensores para realizar el reconocimiento
de actividades puede dar lugar a problemas debido a la incorrecta colocación de
los sensores en el cuerpo. Proponemos un algoritmo de corrección de la orientación
basado en quaterniones para procesar las señales y encontrar un marco de referencia
común independiente de la posición de los sensores y su orientación. Este
algoritmo permite obtener un mejor reconocimiento de actividades al emplearlo
en conjunto con un algoritmo de clasificación, cuando se compara con modelos similares. Además, la transformación de la orientación basada en quaterniones da
lugar a una implementación más rápida.
Uno de los algoritmos más populares para modelar series temporales son los
modelos ocultos de Markov, donde los parámetros del modelo se entrenan usando
el algoritmo de Baum-Welch. Sin embargo, este algoritmo converge en general
a máximos locales, y las múltiples inicializaciones que se necesitan en su implementación lo convierten en un algoritmo de gran carga computacional cuando se
emplea con bases de datos de un volumen considerable. Proponemos emplear la
teoría de aprendizaje espectral para desarrollar un HMM discriminativo que evita
los problemas del algoritmo de Baum-Welch, superándolo tanto en complejidad
como en coste computacional. Cuando se implementa un sistema de reconocimiento de actividades con múltiples
sensores, necesitamos considerar cómo realizar la combinación de la información que proporcionan. La fusión de los datos, se puede realizar tanto a nivel
de señal como a nivel de clasificación. Cuando se realiza a nivel de clasificación, lo
normal es combinar las decisiones de múltiples clasificadores colocados en el cuerpo
para obtener las actividades que se están realizando. Sin embargo, en un caso simple
donde únicamente se emplean dos sensores, que podría ser una implantación
habitual de un sistema de reconocimiento de actividades, la combinación se reduce
a seleccionar el sensor más discriminativo, y no se obtiene mejora con respecto a
emplear un único sensor. En esta tesis proponemos emplear salidas blandas de
los clasificadores para la combinación, desarrollando un modelo que considera la
estructura Markoviana de los datos reales para capturar la dinámica de las actividades.
Mostraremos como este método mejora el reconocimiento de actividades
con respecto a otros métodos de combinación de clasificadores y con respecto a la
fusión de los datos a nivel de señal.
Por último, abordamos el problema de la eficiencia energética de dispositivos
alimentados por baterías en sistemas de reconocimiento de actividades de larga
duración. La aproximación más habitual para mejorar la eficiencia energética consiste
en reducir el volumen de datos que adquieren los sensores. En ese sentido, introducimos un marco general para tratar el problema de la eficiencia energética
en un sistema con múltiples sensores bajo ciertas restricciones de energética. Proponemos
una estrategia de adquisición activa para optimizar el sistema temporal
de recogida de datos, basándonos en la incertidumbre de las actividades dados los
datos que conocemos. Además, desarrollamos un algoritmo de selección de sensores
basado diseño experimental Bayesiano y así obtener la mejor configuración
para realizar el reconocimiento de actividades limitando el número de sensores
empleados y al mismo tiempo reduciendo su consumo energético.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: Luis Ignacio Santamaría Caballero.- Secretario: Pablo Martínez Olmos.- Vocal: Alberto Suárez Gonzále
Stratified Staged Trees: Modelling, Software and Applications
The thesis is focused on Probabilistic Graphical Models (PGMs), which are a rich framework for encoding probability distributions over complex domains. In particular, joint multivariate distributions over large numbers of random variables that interact with each other can be investigated through PGMs and conditional independence statements can be succinctly represented with graphical representations. These representations sit at the intersection of statistics and computer science, relying on concepts mainly from probability theory, graph algorithms and machine learning. They are applied in a wide variety of fields, such as medical diagnosis, image understanding, speech recognition, natural language processing, and many more.
Over the years theory and methodology have developed and been extended in a multitude of directions. In particular, in this thesis different aspects of new classes of PGMs called Staged Trees and Chain Event Graphs (CEGs) are studied. In some sense, Staged Trees are a generalization of Bayesian Networks (BNs). Indeed, BNs provide a transparent graphical tool to define a complex process in terms of conditional independent structures. Despite their strengths in allowing for the reduction in the dimensionality of joint probability distributions of the statistical model and in providing a transparent framework for causal inference, BNs are not optimal GMs in all situations. The biggest problems with their usage mainly occur when the event space is not a simple product of the sample spaces of the random variables of interest, and when conditional independence statements are true only under certain values of variables. This happens when there are context-specific conditional independence structures.
Some extensions to the BN framework have been proposed to handle these issues: context-specific BNs, Bayesian Multinets, or Similarity Networks citep{geiger1996knowledge}. These adopt a hypothesis variable to encode the context-specific statements over a particular set of random variables. For each value taken by the hypothesis variable the graphical modeller has to construct a particular BN model called local network. The collection of these local networks constitute a Bayesian Multinet, Probabilistic Decision Graphs, among others. It has been showed that Chain Event Graph (CEG) models encompass all discrete BN models and its discrete variants described above as a special subclass and they are also richer than Probabilistic Decision Graphs whose semantics is actually somewhat distinct.
Unlike most of its competitors, CEGs can capture all (also context-specific) conditional independences in a unique graph, obtained by a coalescence over the vertices of an appropriately constructed probability tree, called Staged Tree.
CEGs have been developed for categorical variables and have been used for cohort studies, causal analysis and case-control studies. The user\u2019s toolbox to efficiently and effectively perform uncertainty reasoning with
CEGs further includes methods for inference and probability propagation, the exploration of equivalence classes and robustness studies.
The main contributions of this thesis to the literature on Staged Trees are related to Stratified Staged Trees with a keen eye of application. Few observations are made on non-Stratified Staged Trees in the last part of the thesis. A core output of the thesis is an R software package which efficiently implements a host of functions for learning and estimating Staged Trees from data, relying on likelihood principles. Also structural learning algorithms based on distance or divergence between pair of categorical probability distributions and based on the clusterization of probability distributions in a fixed number of stages for each stratum of the tree are developed. Also a new class of Directed Acyclic Graph has been introduced, named Asymmetric-labeled DAG (ALDAG), which gives a BN representation of a given Staged Tree. The ALDAG is a minimal DAG such that the statistical model embedded in the Staged Tree is contained in the one associated to the ALDAG. This is possible thanks to the use of colored edges, so that each color indicates a different type of conditional dependence: total, context-specific, partial or local.
Staged Trees are also adopted in this thesis as a statistical tool for classification purpose. Staged Tree Classifiers are introduced, which exhibit comparable predictive results based on accuracy with respect to algorithms from state of the art of machine learning such as neural networks and random forests. At last, algorithms to obtain an ordering of variables for the construction of the Staged Tree are designed
Semantic multimedia analysis using knowledge and context
PhDThe difficulty of semantic multimedia analysis can be attributed to the
extended diversity in form and appearance exhibited by the majority of
semantic concepts and the difficulty to express them using a finite number
of patterns. In meeting this challenge there has been a scientific debate
on whether the problem should be addressed from the perspective of using
overwhelming amounts of training data to capture all possible instantiations
of a concept, or from the perspective of using explicit knowledge about
the concepts’ relations to infer their presence. In this thesis we address
three problems of pattern recognition and propose solutions that combine
the knowledge extracted implicitly from training data with the knowledge
provided explicitly in structured form. First, we propose a BNs modeling
approach that defines a conceptual space where both domain related evi-
dence and evidence derived from content analysis can be jointly considered
to support or disprove a hypothesis. The use of this space leads to sig-
nificant gains in performance compared to analysis methods that can not
handle combined knowledge. Then, we present an unsupervised method
that exploits the collective nature of social media to automatically obtain
large amounts of annotated image regions. By proving that the quality of
the obtained samples can be almost as good as manually annotated images
when working with large datasets, we significantly contribute towards scal-
able object detection. Finally, we introduce a method that treats images,
visual features and tags as the three observable variables of an aspect model
and extracts a set of latent topics that incorporates the semantics of both
visual and tag information space. By showing that the cross-modal depen-
dencies of tagged images can be exploited to increase the semantic capacity
of the resulting space, we advocate the use of all existing information facets
in the semantic analysis of social media
Modelling Uncertainty in Black-box Classification Systems
[eng] Currently, thanks to the Big Data boom, the excellent results obtained by deep learning models and the strong digital transformation experienced over the last years, many companies have decided to incorporate machine learning models into their systems. Some companies have detected this opportunity and are making a portfolio of artificial intelligence services available to third parties in the form of application programming interfaces (APIs). Subsequently, developers include calls to these APIs to incorporate AI functionalities in their products. Although it is an option that saves time and resources, it is true that, in most cases, these APIs are displayed in the form of blackboxes, the details of which are unknown to their clients. The complexity of such products typically leads to a lack of control and knowledge of the internal components, which, in turn, can drive to potential uncontrolled risks. Therefore, it is necessary to develop methods capable of evaluating the performance of these black-boxes when applied to a specific application. In this work, we present a robust uncertainty-based method for evaluating the performance of both probabilistic and categorical classification black-box models, in particular APIs, that enriches the predictions obtained with an uncertainty score. This uncertainty score enables the detection of inputs with very confident but erroneous predictions while protecting against out of distribution data points when deploying the model in a productive setting. In the first part of the thesis, we develop a thorough revision of the concept of uncertainty, focusing on the uncertainty of classification systems. We review the existingrelated literature, describing the different approaches for modelling this uncertainty, its application to different use cases and some of its desirable properties. Next, we introduce the proposed method for modelling uncertainty in black-box settings. Moreover, in the last chapters of the thesis, we showcase the method applied to different domains, including NLP and computer vision problems. Finally, we include two reallife applications of the method: classification of overqualification in job descriptions and readability assessment of texts.[spa] La tesis propone un método para el cálculo de la incertidumbre asociada a las predicciones de APIs o librerías externas de sistemas de clasificación
Evaluation and Improvement of Machine Learning Algorithms in Drug Discovery
Drug discovery plays a critical role in today’s society for treating and preventing sickness and possibly deadly viruses. In early drug discovery development, the main challenge is to find candidate molecules to be used as drugs to treat a disease. This also means assessing key properties that are wanted in the inter- action between molecules and proteins. It is a very difficult problem because the molecular space is so big and complex. Drug discovery development is es- timated to take around 12–15 years on average, and the costs of developing a single drug amount to $2.8 billion dollars in the US. Modern drug discovery and drug development often start with finding candi- date drug molecules (‘compounds’) that can bind to a target, usually a protein in our body. Since there are billions of possible molecules to test, this becomes an endless search for compounds that show promising bioactivity. The search method is called high-throughput screening (HTS), or virtual HTS (VHTS) in a virtual environment. The traditional approach to HTS has been to test every compound one by one. More recent approaches have seen the use of robotics and of features extracted from the molecule, combining them with machine learning algorithms, in an effort to make the process more automated. Research has shown that this will still lead to human errors and bias. So, how can we use machine learning algorithms to make this approach more cost-efficient and more robust to human errors? This project tried to address these issues and led to two scientific papers as a result. The first paper explores how common evaluation metrics used for classification can actually be unsuited to the task, leading to severe consequences when put into a real application. The argument is based on basic principles of Decision Theory, which is recognized in the field of machine learning but has not been put into much use. It makes a distinction between predicting the most probable class and predicting the most valuable class in terms of the “cost” or “gains” for the classes. In an algorithm for classifying a particular disease in a patient, the wrong classification could lead to a life or death situation. The principles also apply to drug discovery, where the cost of further developing and optimizing a "useless" drug could be huge. The goal of the classifier should therefore not be to guess the correct class but to choose the optimal class, and the metric must depend on the type of classification problem. Thus, we show that common metrics such as precision, balanced accuracy, F1-score, Area Under The Curve, Matthews Correlation Coefficient, and Fowlkes-Mallows index are affected by this problem, and propose an evaluation method grounded on the foundations of Decision Theory to provide a solution to this problem. The metric presented, called utility, takes into account gains and losses for each correct or incorrect classification of the confusion matrix. For this to work effectively, the output of the machine learning algorithm needs to be a set of sensible probabilities for each class. This brings us to the second paper. Machine learning algorithms usually output a set of real numbers for the classes they try to predict, which, possibly after some transformation (for exam- ple the ‘softmax’ function), are meant to represent probabilities for the classes. However, the problem is that these numbers cannot be reliably interpreted as actual probabilities, in the sense of degrees of belief. In the paper, we propose the implementation of a probability transducer to transform the output of the algorithm into sensible probabilities. These are then used in conjunction with the utilities to choose the class with the maximal expected utility. The results show that the transducer gives better scores, in terms of the utilities, for all cases compared to the standard method used in machine learning.Masteroppgave i Programutvikling samarbeid med HVLPROG399MAMN-PRO
Probabilistic models for mining imbalanced relational data
Most data mining and pattern recognition techniques are designed for learning from at data files with the assumption of equal populations per class. However, most real-world data are stored as rich relational databases that generally have imbalanced class distribution. For such domains, a rich relational technique is required to accurately model the different objects and relationships in the domain, which can not be easily represented as a set of simple attributes, and at the same time handle the imbalanced class problem.Motivated by the significance of mining imbalanced relational databases that represent the majority of real-world data, learning techniques for mining imbalanced relational domains are investigated. In this thesis, the employment of probabilistic models in mining relational databases is explored. In particular, the Probabilistic Relational Models (PRMs) that were proposed as an extension of the attribute-based Bayesian Networks. The effectiveness of PRMs in mining real-world databases was explored by learning PRMs from a real-world university relational database. A visual data mining tool is also proposed to aid the interpretation of the outcomes of the PRM learned models.Despite the effectiveness of PRMs in relational learning, the performance of PRMs as predictive models is significantly hindered by the imbalanced class problem. This is due to the fact that PRMs share the assumption common to other learning techniques of relatively balanced class distributions in the training data. Therefore, this thesis proposes a number of models utilizing the effectiveness of PRMs in relational learning and extending it for mining imbalanced relational domains.The first model introduced in this thesis examines the problem of mining imbalanced relational domains for a single two-class attribute. The model is proposed by enriching the PRM learning with the ensemble learning technique. The premise behind this model is that an ensemble of models would attain better performance than a single model, as misclassification committed by one of the models can be often correctly classified by others.Based on this approach, another model is introduced to address the problem of mining multiple imbalanced attributes, in which it is important to predict several attributes rather than a single one. In this model, the ensemble bagging sampling approach is exploited to attain a single model for mining several attributes. Finally, the thesis outlines the problem of imbalanced multi-class classification and introduces a generalized framework to handle this problem for both relational and non-relational domains
- …