14,653 research outputs found
CSNL: A cost-sensitive non-linear decision tree algorithm
This article presents a new decision tree learning algorithm called CSNL that induces Cost-Sensitive Non-Linear decision trees. The algorithm is based on the hypothesis that nonlinear decision nodes provide a better basis than axis-parallel decision nodes and utilizes discriminant analysis to construct nonlinear decision trees that take account of costs of misclassification.
The performance of the algorithm is evaluated by applying it to seventeen datasets and the results are compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date. The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the datasets and is considerably faster. The use of bagging with CSNL further enhances its performance showing the significant benefits of using nonlinear decision nodes.
The performance of the algorithm is evaluated by applying it to seventeen data sets and the results are
compared with those obtained by two well known cost-sensitive algorithms, ICET and MetaCost, which generate multiple trees to obtain some of the best results to date.
The results show that CSNL performs at least as well, if not better than these algorithms, in more than twelve of the data sets and is considerably faster.
The use of bagging with CSNL further enhances its performance showing the significant benefits of using non-linear decision nodes
Clear Visual Separation of Temporal Event Sequences
Extracting and visualizing informative insights from temporal event sequences
becomes increasingly difficult when data volume and variety increase. Besides
dealing with high event type cardinality and many distinct sequences, it can be
difficult to tell whether it is appropriate to combine multiple events into one
or utilize additional information about event attributes. Existing approaches
often make use of frequent sequential patterns extracted from the dataset,
however, these patterns are limited in terms of interpretability and utility.
In addition, it is difficult to assess the role of absolute and relative time
when using pattern mining techniques.
In this paper, we present methods that addresses these challenges by
automatically learning composite events which enables better aggregation of
multiple event sequences. By leveraging event sequence outcomes, we present
appropriate linked visualizations that allow domain experts to identify
critical flows, to assess validity and to understand the role of time.
Furthermore, we explore information gain and visual complexity metrics to
identify the most relevant visual patterns. We compare composite event learning
with two approaches for extracting event patterns using real world company
event data from an ongoing project with the Danish Business Authority.Comment: In Proceedings of the 3rd IEEE Symposium on Visualization in Data
Science (VDS), 201
Fuzzy rule-based system applied to risk estimation of cardiovascular patients
Cardiovascular decision support is one area of increasing research interest. On-going collaborations between clinicians and computer scientists are looking at the application of knowledge discovery in databases to the area of patient diagnosis, based on clinical records. A fuzzy rule-based system for risk estimation of cardiovascular patients is proposed. It uses a group of fuzzy rules as a knowledge representation about data pertaining to cardiovascular patients. Several algorithms for the discovery of an easily readable and understandable group of fuzzy rules are formalized and analysed. The accuracy of risk estimation and the interpretability of fuzzy rules are discussed. Our study shows, in comparison to other algorithms used in knowledge discovery, that classifcation with a group of fuzzy rules is a useful technique for risk estimation of cardiovascular patients. © 2013 Old City Publishing, Inc
On The Stability of Interpretable Models
Interpretable classification models are built with the purpose of providing a
comprehensible description of the decision logic to an external oversight
agent. When considered in isolation, a decision tree, a set of classification
rules, or a linear model, are widely recognized as human-interpretable.
However, such models are generated as part of a larger analytical process. Bias
in data collection and preparation, or in model's construction may severely
affect the accountability of the design process. We conduct an experimental
study of the stability of interpretable models with respect to feature
selection, instance selection, and model selection. Our conclusions should
raise awareness and attention of the scientific community on the need of a
stability impact assessment of interpretable models
Strict General Setting for Building Decision Procedures into Theorem Provers
The efficient and flexible incorporating of decision procedures into theorem provers is very important for their successful use. There are several approaches for combining and augmenting of decision procedures; some of them support handling uninterpreted functions, congruence closure, lemma invoking etc. In this paper we present a variant of one general setting for building decision procedures into theorem provers (gs framework [18]). That setting is based on macro inference rules motivated by techniques used in different approaches. The general setting enables a simple describing of different combination/augmentation schemes. In this paper, we further develop and extend this setting by an imposed ordering on the macro inference rules. That ordering leads to a ”strict setting”. It makes implementing and using variants of well-known or new schemes within this framework a very easy task even for a non-expert user. Also, this setting enables easy comparison of different combination/augmentation schemes and combination of their ideas
State of B\"uchi Complementation
Complementation of B\"uchi automata has been studied for over five decades
since the formalism was introduced in 1960. Known complementation constructions
can be classified into Ramsey-based, determinization-based, rank-based, and
slice-based approaches. Regarding the performance of these approaches, there
have been several complexity analyses but very few experimental results. What
especially lacks is a comparative experiment on all of the four approaches to
see how they perform in practice. In this paper, we review the four approaches,
propose several optimization heuristics, and perform comparative
experimentation on four representative constructions that are considered the
most efficient in each approach. The experimental results show that (1) the
determinization-based Safra-Piterman construction outperforms the other three
in producing smaller complements and finishing more tasks in the allocated time
and (2) the proposed heuristics substantially improve the Safra-Piterman and
the slice-based constructions.Comment: 28 pages, 4 figures, a preliminary version of this paper appeared in
the Proceedings of the 15th International Conference on Implementation and
Application of Automata (CIAA
Cross-validation and Peeling Strategies for Survival Bump Hunting using Recursive Peeling Methods
We introduce a framework to build a survival/risk bump hunting model with a
censored time-to-event response. Our Survival Bump Hunting (SBH) method is
based on a recursive peeling procedure that uses a specific survival peeling
criterion derived from non/semi-parametric statistics such as the
hazards-ratio, the log-rank test or the Nelson-Aalen estimator. To optimize the
tuning parameter of the model and validate it, we introduce an objective
function based on survival or prediction-error statistics, such as the log-rank
test and the concordance error rate. We also describe two alternative
cross-validation techniques adapted to the joint task of decision-rule making
by recursive peeling and survival estimation. Numerical analyses show the
importance of replicated cross-validation and the differences between criteria
and techniques in both low and high-dimensional settings. Although several
non-parametric survival models exist, none addresses the problem of directly
identifying local extrema. We show how SBH efficiently estimates extreme
survival/risk subgroups unlike other models. This provides an insight into the
behavior of commonly used models and suggests alternatives to be adopted in
practice. Finally, our SBH framework was applied to a clinical dataset. In it,
we identified subsets of patients characterized by clinical and demographic
covariates with a distinct extreme survival outcome, for which tailored medical
interventions could be made. An R package `PRIMsrc` is available on CRAN and
GitHub.Comment: Keywords: Exploratory Survival/Risk Analysis, Survival/Risk
Estimation & Prediction, Non-Parametric Method, Cross-Validation, Bump
Hunting, Rule-Induction Metho
- …