187 research outputs found
Using Sensitivity as a Method for Ranking the Test Cases Classified by Binary Decision Trees
Usually, data mining projects that are based on decision trees for classifying test cases will use the
probabilities provided by these decision trees for ranking classified test cases. We have a need for a better
method for ranking test cases that have already been classified by a binary decision tree because these
probabilities are not always accurate and reliable enough. A reason for this is that the probability estimates
computed by existing decision tree algorithms are always the same for all the different cases in a particular leaf of
the decision tree. This is only one reason why the probability estimates given by decision tree algorithms can not
be used as an accurate means of deciding if a test case has been correctly classified. Isabelle Alvarez has
proposed a new method that could be used to rank the test cases that were classified by a binary decision tree
[Alvarez, 2004]. In this paper we will give the results of a comparison of different ranking methods that are based
on the probability estimate, the sensitivity of a particular case or both
Defining Interestigness for Association Rules
Interestingness in Association Rules has been a major topic of research in the past decade. The
reason is that the strength of association rules, i.e. its ability to discover ALL patterns given some thresholds
on support and confidence, is also its weakness. Indeed, a typical association rules analysis on real data often
results in hundreds or thousands of patterns creating a data mining problem of the second order. In other
words, it is not straightforward to determine which of those rules are interesting for the end-user. This paper
provides an overview of some existing measures of interestingness and we will comment on their properties.
In general, interestingness measures can be divided into objective and subjective measures. Objective
measures tend to express interestingness by means of statistical or mathematical criteria, whereas subjective
measures of interestingness aim at capturing more practical criteria that should be taken into account, such as
unexpectedness or actionability of rules. This paper only focusses on objective measures of interestingness
A framework for internal fraud risk reduction at it integrating business processes : the IFR² framework
Fraud is a million dollar business and it is increasing every year. Both internal and external fraud present a substantial cost to our economy worldwide. A review of the academic literature learns that the academic community only addresses external fraud and how to detect this type of fraud. Little or no effort to our knowledge has been put in investigating how to prevent ánd to detect internal fraud, which we call ‘internal fraud risk reduction’. Taking together the urge for research in internal fraud and the lack of it in academic literature, research to reduce internal fraud risk is pivotal. Only after having a framework in which to implement empirical research, this topic can further be investigated. In this paper we present the IFR² framework, deduced from both the academic literature and from current business practices, where the core of this framework suggests to use a data mining approach.El fraude es un negocio millonario y está aumentando cada año. Tanto el fraude interno como el externo presentan un coste considerable para nuestra economÃa en todo el mundo. Este artÃculo sobre la literatura académica enseña que la comunidad académica solo se dirige al fraude externo, y cómo se detecta este tipo de fraude. Que sepamos, se ha hecho poco o ningún esfuerzo en investigar cómo evitar y detectar el fraude interno, al que llamamos ‘reducción del riesgo de fraude interno’. Teniendo en cuenta la urgencia de investigar el fraude interno, y la ausencia de ello en la literatura académica, la investigación para reducir este tipo de fraude es esencial. Este tema puede ser aún investigado con mayor profundidad solo después de tener un marco, en el que implementar investigación empÃrica. En este artÃculo, presentamos el marco IFR, deducido tanto de la literatura académica como de las prácticas empresariales actuales, donde el foco del marco sugiere usar un enfoque de extracción de datos
Classifier PGN: Classification with High Confidence Rules
ACM Computing Classification System (1998): H.2.8, H.3.3.Associative classifiers use a set of class association rules, generated from a given training set, to classify new instances. Typically, these techniques set a minimal support to make a first selection of appropriate rules and discriminate subsequently between high and low quality rules by means of a quality measure such as confidence. As a result, the final set of class association rules have a support equal or greater than a predefined threshold, but many of them have confidence levels below 100%. PGN is a novel associative classifier which turns the traditional approach around and uses a confidence level of 100% as a first selection criterion, prior to maximizing the support. This article introduces PGN and evaluates the strength and limitations of PGN empirically. The results are promising and show that PGN is competitive with other well-known classifiers
SEMANTIC AND ABSTRACTION CONTENT OF ART IMAGES
In this paper the semantic and abstraction content of art images is studied. Different techniques for search in art image repositories are analyzed and new ones are proposed. The content-based retrieval process integrates the search on different components, linked in XML structures. Some experiments over 200 paintings of six Israel contemporary artists are done and analyzed
Measuring Implicit Bias Using SHAP Feature Importance and Fuzzy Cognitive Maps
In this paper, we integrate the concepts of feature importance with implicit bias in the context of pattern classification. This is done by means of a three-step methodology that involves (i) building a classifier and tuning its hyperparameters, (ii) building a Fuzzy Cognitive Map model able to quantify implicit bias, and (iii) using the SHAP feature importance to active the neural concepts when performing simulations. The results using a real case study concerning fairness research support our two-fold hypothesis. On the one hand, it is illustrated the risks of using a feature importance method as an absolute tool to measure implicit bias. On the other hand, it is concluded that the amount of bias towards protected features might differ depending on whether the features are numerically or categorically encoded
Online learning of windmill time series using Long Short-term Cognitive Networks
Forecasting windmill time series is often the basis of other processes such
as anomaly detection, health monitoring, or maintenance scheduling. The amount
of data generated on windmill farms makes online learning the most viable
strategy to follow. Such settings require retraining the model each time a new
batch of data is available. However, update the model with the new information
is often very expensive to perform using traditional Recurrent Neural Networks
(RNNs). In this paper, we use Long Short-term Cognitive Networks (LSTCNs) to
forecast windmill time series in online settings. These recently introduced
neural systems consist of chained Short-term Cognitive Network blocks, each
processing a temporal data chunk. The learning algorithm of these blocks is
based on a very fast, deterministic learning rule that makes LSTCNs suitable
for online learning tasks. The numerical simulations using a case study with
four windmills showed that our approach reported the lowest forecasting errors
with respect to a simple RNN, a Long Short-term Memory, a Gated Recurrent Unit,
and a Hidden Markov Model. What is perhaps more important is that the LSTCN
approach is significantly faster than these state-of-the-art models
- …