9,695 research outputs found
On Cognitive Preferences and the Plausibility of Rule-based Models
It is conventional wisdom in machine learning and data mining that logical
models such as rule sets are more interpretable than other models, and that
among such rule-based models, simpler models are more interpretable than more
complex ones. In this position paper, we question this latter assumption by
focusing on one particular aspect of interpretability, namely the plausibility
of models. Roughly speaking, we equate the plausibility of a model with the
likeliness that a user accepts it as an explanation for a prediction. In
particular, we argue that, all other things being equal, longer explanations
may be more convincing than shorter ones, and that the predominant bias for
shorter models, which is typically necessary for learning powerful
discriminative models, may not be suitable when it comes to user acceptance of
the learned models. To that end, we first recapitulate evidence for and against
this postulate, and then report the results of an evaluation in a
crowd-sourcing study based on about 3.000 judgments. The results do not reveal
a strong preference for simple rules, whereas we can observe a weak preference
for longer rules in some domains. We then relate these results to well-known
cognitive biases such as the conjunction fallacy, the representative heuristic,
or the recogition heuristic, and investigate their relation to rule length and
plausibility.Comment: V4: Another rewrite of section on interpretability to clarify focus
on plausibility and relation to interpretability, comprehensibility, and
justifiabilit
Subgroup Discovery: Real-World Applications
Subgroup discovery is a data mining technique which extracts interesting rules with respect
to a target variable. An important characteristic of this task is the combination of predictive
and descriptive induction. In this paper, an overview about subgroup discovery is performed.
In addition, di erent real-world applications solved through evolutionary algorithms where the
suitability and potential of this type of algorithms for the development of subgroup discovery
algorithms are presented
SAFS: A Deep Feature Selection Approach for Precision Medicine
In this paper, we propose a new deep feature selection method based on deep
architecture. Our method uses stacked auto-encoders for feature representation
in higher-level abstraction. We developed and applied a novel feature learning
approach to a specific precision medicine problem, which focuses on assessing
and prioritizing risk factors for hypertension (HTN) in a vulnerable
demographic subgroup (African-American). Our approach is to use deep learning
to identify significant risk factors affecting left ventricular mass indexed to
body surface area (LVMI) as an indicator of heart damage risk. The results show
that our feature learning and representation approach leads to better results
in comparison with others
Subgroup Discovery trhough Evolutionary Fuzzy Systems applied to Bioinformatic problems
Subgroup discovery is a descriptive data mining technique using supervised learning. This
paper presents a summary about the main properties and elements about subgroup discovery task.
In addition, we will focus on the suitability and potential of the search performed by evolutionary
algorithms in order to apply in the development of subgroup discovery algorithms, and in the use
of fuzzy logic which is a soft computing technique very close to the human reasoning. The
hybridisation of both techniques are well known as evolutionary fuzzy system.
The most relevant applications of evolutionary fuzzy systems for subgroup discovery in the
bioinformatics domains are outlined in this work. Specifically, these algorithms are applied to a
problem based on the Influenza A virus and the accute sore throat problem
Interpretable multiclass classification by MDL-based rule lists
Interpretable classifiers have recently witnessed an increase in attention
from the data mining community because they are inherently easier to understand
and explain than their more complex counterparts. Examples of interpretable
classification models include decision trees, rule sets, and rule lists.
Learning such models often involves optimizing hyperparameters, which typically
requires substantial amounts of data and may result in relatively large models.
In this paper, we consider the problem of learning compact yet accurate
probabilistic rule lists for multiclass classification. Specifically, we
propose a novel formalization based on probabilistic rule lists and the minimum
description length (MDL) principle. This results in virtually parameter-free
model selection that naturally allows to trade-off model complexity with
goodness of fit, by which overfitting and the need for hyperparameter tuning
are effectively avoided. Finally, we introduce the Classy algorithm, which
greedily finds rule lists according to the proposed criterion. We empirically
demonstrate that Classy selects small probabilistic rule lists that outperform
state-of-the-art classifiers when it comes to the combination of predictive
performance and interpretability. We show that Classy is insensitive to its
only parameter, i.e., the candidate set, and that compression on the training
set correlates with classification performance, validating our MDL-based
selection criterion
Visually Mining Interesting Patterns in Multivariate Datasets
Data mining for patterns and knowledge discovery in multivariate datasets are very important processes and tasks to help analysts understand the dataset, describe the dataset, and predict unknown data values. However, conventional computer-supported data mining approaches often limit the user from getting involved in the mining process and performing interactions during the pattern discovery. Besides, without the visual representation of the extracted knowledge, the analysts can have difficulty explaining and understanding the patterns. Therefore, instead of directly applying automatic data mining techniques, it is necessary to develop appropriate techniques and visualization systems that allow users to interactively perform knowledge discovery, visually examine the patterns, adjust the parameters, and discover more interesting patterns based on their requirements. In the dissertation, I will discuss different proposed visualization systems to assist analysts in mining patterns and discovering knowledge in multivariate datasets, including the design, implementation, and the evaluation. Three types of different patterns are proposed and discussed, including trends, clusters of subgroups, and local patterns. For trend discovery, the parameter space is visualized to allow the user to visually examine the space and find where good linear patterns exist. For cluster discovery, the user is able to interactively set the query range on a target attribute, and retrieve all the sub-regions that satisfy the user\u27s requirements. The sub-regions that satisfy the same query and are neareach other are grouped and aggregated to form clusters. For local pattern discovery, the patterns for the local sub-region with a focal point and its neighbors are computationally extracted and visually represented. To discover interesting local neighbors, the extracted local patterns are integrated and visually shown to the analysts. Evaluations of the three visualization systems using formal user studies are also performed and discussed
- …