12 research outputs found
Maximum of entropy for belief intervals under Evidence Theory
The Dempster-Shafer Theory (DST) or Evidence Theory has been commonly used to
deal with uncertainty. It is based on the basic probability assignment concept (BPA). The upper entropy
on the credal set associated with a BPA is the only uncertainty measure in DST that verifies all the
necessary mathematical properties and behaviors. Nonetheless, its computation is notably complex. For this
reason, many alternatives to this measure have been recently proposed, but they do not satisfy most of the
mathematical requirements and present some undesirable behaviors. Belief intervals have been frequently
employed to quantify uncertainty in DST in the last years, and they can represent the uncertainty-basedinformation
better than a BPA. In this research, we develop a new uncertainty measure that consists of the
maximum of entropy on the credal set corresponding to belief intervals for singletons. It verifies all the
crucial mathematical requirements and presents good behavior, solving most of the shortcomings found in
uncertainty measures proposed recently. Moreover, its calculation is notably easier than the upper entropy
on the credal set associated with the BPA. Therefore, our proposed uncertainty measure is more suitable to
be used in practical applications.Spanish Ministerio de Economia y Competitividad
TIN2016-77902-C3-2-PEuropean Union (EU)
TEC2015-69496-
Required mathematical properties and behaviors of uncertainty measures on belief intervals
The Dempster–Shafer theory of evidence (DST) has
been widely used to handle uncertainty‐based information.
It is based on the concept of basic probability
assignment (BPA). Belief intervals are easier to
manage than a BPA to represent uncertainty‐based
information. For this reason, several uncertainty measures
for DST recently proposed are based on belief
intervals. In this study, we carry out a study about the
crucial mathematical properties and behavioral requirements
that must be verified by every uncertainty
measure on belief intervals. We base on the study
previously carried out for uncertainty measures on
BPAs. Furthermore, we analyze which of these properties
are satisfied by each one of the uncertainty
measures on belief intervals proposed so far. Such a
comparative analysis shows that, among these measures,
the maximum of entropy on the belief intervals
is the most suitable one to be employed in practical
applications since it is the only one that satisfies all the
required mathematical properties and behaviors
Upgrading the Fusion of Imprecise Classifiers
Imprecise classification is a relatively new task within Machine Learning. The difference
with standard classification is that not only is one state of the variable under study determined, a set
of states that do not have enough information against them and cannot be ruled out is determined
as well. For imprecise classification, a mode called an Imprecise Credal Decision Tree (ICDT) that
uses imprecise probabilities and maximum of entropy as the information measure has been presented.
A difficult and interesting task is to show how to combine this type of imprecise classifiers.
A procedure based on the minimum level of dominance has been presented; though it represents a
very strong method of combining, it has the drawback of an important risk of possible erroneous
prediction. In this research, we use the second-best theory to argue that the aforementioned type of
combination can be improved through a new procedure built by relaxing the constraints. The new
procedure is compared with the original one in an experimental study on a large set of datasets, and
shows improvement.UGR-FEDER funds under Project A-TIC-344-UGR20FEDER/Junta de Andalucía-Consejería de Transformación Económica, Industria, Conocimiento
y Universidades” under Project P20_0015
Using extreme prior probabilities on the Naive Credal Classifier
The Naive Credal Classifier (NCC) was the first method proposed for Imprecise Classification. It starts from the known Naive Bayes algorithm (NB), which assumes that the attributes are independent given the class variable. Despite this unrealistic assumption, NB and NCC have been successfully used in practical applications. In this work, we propose a new version of NCC, called Extreme Prior Naive Credal Classifier (EP-NCC). Unlike NCC, EP-NCC takes into consideration the lower and upper prior probabilities of the class variable in the estimation of the lower and upper conditional probabilities. We demonstrate that, with our proposed EP-NCC, the predictions are more informative than with NCC without increasing the risk of making erroneous predictions. An experimental analysis carried out in this work shows that EP-NCC significantly outperforms NCC and obtains statistically equivalent results to the algorithm proposed so far for Imprecise Classification based on decision trees, even though EP-NCC is computationally simpler. Therefore, EP-NCC is more suitable to be applied to large datasets for Imprecise Classification than the methods proposed so far in this field. This is an important issue in favor of our proposal due to the increasing amount of data in every area.This work has been supported by UGR-FEDER funds under Project A-TIC-344-UGR20, by the “FEDER/Junta de Andalucía-Consejería de Transformación Económica, Industria, Conocimiento y Universidades ” under Project P20_00159, and by research scholarship FPU17/02685
Decision Tree Ensemble Method for Analyzing Traffic Accidents of Novice Drivers in Urban Areas
Presently, there is a critical need to analyze traffic accidents in order to mitigate their terrible economic and human impact. Most accidents occur in urban areas. Furthermore, driving experience has an important effect on accident analysis, since inexperienced drivers are more likely to suffer fatal injuries. This work studies the injury severity produced by accidents that involve inexperienced drivers in urban areas. The analysis was based on data provided by the Spanish General Traffic Directorate. The information root node variation (IRNV) method (based on decision trees) was used to get a rule set that provides useful information about the most probable causes of fatalities in accidents involving inexperienced drivers in urban areas. This may prove useful knowledge in preventing this kind of accidents and/or mitigating their consequences.his work has been supported by the Spanish “Ministerio de Economía y Competitividad” and by “Fondo Europeo de Desarrollo Regional” (FEDER) under Project TEC2015-69496-R
Combining gene expression data and prior knowledge for inferring gene regulatory networks via Bayesian networks using structural restrictions
Ministerio de Economía y Competitividad y Fondo Europeo de Desarrollo Regional (FEDER), proyectos TEC2015-69496-R y TIN2016-77902-C3-2-
Value‐based potentials: Exploiting quantitative information regularity patterns in probabilistic graphical models
This study was jointly supported by the Spanish Ministry of Education and Science under projects PID2019-106758GB-C31 and TIN2016-77902-C3-2-P, and the European Regional Development Fund (FEDER). Funding for open access charge from Universidad de Granada/CBUA.When dealing with complex models (i.e., models with
many variables, a high degree of dependency between
variables, or many states per variable), the efficient representation
of quantitative information in probabilistic
graphical models (PGMs) is a challenging task. To address
this problem, this study introduces several new structures,
aptly named value‐based potentials (VBPs), which are
based exclusively on the values. VBPs leverage repeated
values to reduce memory requirements. In the present
paper, they are compared with some common structures,
like standard tables or unidimensional arrays, and probability
trees (PT). Like VBPs, PTs are designed to reduce
the memory space, but this is achieved only if value repetitions
correspond to context‐specific independence
patterns (i.e., repeated values are related to consecutive
indices or configurations). VBPs are devised to overcome
this limitation. The goal of this study is to analyze the
properties of VBPs. We provide a theoretical analysis of
VBPs and use them to encode the quantitative information
of a set of well‐known Bayesian networks, measuring
the access time to their content and the computational
time required to perform some inference tasks.Spanish Government PID2019-106758GB-C31
TIN2016-77902-C3-2-PEuropean Commissio
Bagging of Credal Decision Trees for Imprecise Classification
The Credal Decision Trees (CDT) have been adapted for Imprecise Classification (ICDT). However, no ensembles of imprecise classifiers have been proposed so far. The reason might be that it is not a trivial question to combine the predictions made by multiple imprecise classifier. In fact, if the combination method used is not appropriate, the ensemble method could even worse the performance of one single classifier. On the other hand, the Bagging scheme has shown to provide satisfactory results in precise classification, specially when it is used with CDTs, which are known to be very weak and unstable classifiers. For these reasons, in this research, it is proposed a new Bagging scheme with ICDTs. It is presented a new technique for combining predictions made by imprecise classifiers that tries to maximize the precision of the bagging classifier. If the procedure for such a combination is too conservative it is easy to obtain few information and worse the results of a single classifier. Our proposal considers only the states with the minimum level of non-dominance. An exhaustive experimentation carried out in this work has shown that the Bagging of ICDTs, with our proposed combination technique, performs clearly better than a single ICDT.This work has been supported by the Spanish “Ministerio de Economía y Competitividad” and by “Fondo Europeo de Desarrollo Regional” (FEDER) under Project TEC2015-69496-R
Nuevas aplicaciones de modelos basados en probabilidades imprecisas dentro de la minería de datos
When we have information about a finite set of possible alternatives provided
by an expert or dataset, a mathematical model is needed to represent
such information. In some cases, a unique probability distribution is not appropriate
for this purpose because the available information is not sufficient.
For this reason, several mathematical theories and models based on imprecise
probabilities have been developed in the literature. In this thesis work, we analyze
the relations between some imprecise probability theories and study the
properties of some models based on imprecise probabilities. When imprecise
probability theories and models arise, tools for quantifying the uncertaintybased
information in such theories and models, usually called uncertainty
measures, are needed. In this thesis work, we analyze the properties of some
existing uncertainty measures in theories based on imprecise probabilities and
propose uncertainty measures in imprecise probability theories and models
that present some advantages over the existing ones.
Situations in which it is necessary to represent the information provided
by a dataset about a finite set of possible alternatives arise in classification, an
essential task within Data Mining. This well-known task consists of predicting,
for a given instance described via a set of attributes, the value of a variable
under study, known as the class variable. In classification, it is often needed to
quantify the uncertainty-based information about the class variable. For this
purpose, classical probability theory (PT) has been employed for many years.
In the last years, classification algorithms that represent the information about
the class variable via imprecise probability models have been developed. Via
experimental studies, it has been shown that classification methods based on
imprecise probabilities significantly outperform the ones that utilize PT when
data contain errors.
When classifying an instance, classifiers tend to predict a single value of the
class variable. Nonetheless, in some cases, there is not enough information
available for a classifier to point out a single class value. In these situations, it
is more logical that classifiers predict a set of class values instead of a single
value of the class variable. This is known as Imprecise Classification.
Classification algorithms (including Imprecise Classification) often aim to
minimize the number of instances erroneously classified. This would be optimal
if all classification errors had the same importance. Nevertheless, in practical applications, different classification errors usually lead to different costs.
For this reason, classifiers that take the misclassification costs into account,
also known as cost-sensitive classifiers, have been developed in the literature.
Traditional classification (including Imprecise Classification) assumes that
each instance has a single value of a class variable. However, in some domains,
this task does not fit well because an instance may belong to multiple labels
simultaneously. In these domains, the Multi-Label Classification task (MLC)
is more suitable than traditional classification. MLC aims to predict the set of
labels associated with a given instance described via an attribute set. Most of
the MLC methods proposed so far represent the information provided by an
MLC dataset about the set of labels via classical PT.
In this thesis work, we develop new classification algorithms based on imprecise
probability models, including Imprecise Classification, cost-sensitive
Imprecise Classification, and MLC, that present some advantages and obtain
better experimental results than the ones of the state-of-the-art.En esta tesis seguimos la línea de investigación de teorías y modelos de
probabilidades imprecisas y medidas de incertidumbre con probabilidades
imprecisas. También proponemos nuevos métodos de clasificación basados en
probabilidades imprecisas que obtienen mejor rendimiento que los del estado
del arte.Tesis Univ. Granada
A Variation of the Algorithm to Achieve the Maximum Entropy for Belief Functions
Evidence theory (TE), based on imprecise probabilities, is often more appropriate than the classical theory of probability (PT) to apply in situations with inaccurate or incomplete information. The quantification of the information that a piece of evidence involves is a key issue in TE. Shannon’s entropy is an excellent measure in the PT for such purposes, being easy to calculate and fulfilling a wide set of properties that make it axiomatically the best one in PT. In TE, a similar role is played by the maximum of entropy (ME), verifying a similar set of properties. The ME is the unique measure in TE that has such axiomatic behavior. The problem of the ME in TE is its complex computational calculus, which makes its use problematic in some situations. There exists only one algorithm for the calculus of the ME in TE with a high computational cost, and this problem has been the principal drawback found with this measure. In this work, a variation of the original algorithm is presented. It is shown that with this modification, a reduction in the necessary steps to attain the ME can be obtained because, in each step, the power set of possibilities is reduced with respect to the original algorithm, which is the key point of the complexity found. This solution can provide greater applicability of this measure