30 research outputs found
Conceptual Views on Tree Ensemble Classifiers
Random Forests and related tree-based methods are popular for supervised
learning from table based data. Apart from their ease of parallelization, their
classification performance is also superior. However, this performance,
especially parallelizability, is offset by the loss of explainability.
Statistical methods are often used to compensate for this disadvantage. Yet,
their ability for local explanations, and in particular for global
explanations, is limited. In the present work we propose an algebraic method,
rooted in lattice theory, for the (global) explanation of tree ensembles. In
detail, we introduce two novel conceptual views on tree ensemble classifiers
and demonstrate their explanatory capabilities on Random Forests that were
trained with standard parameters
Drawing Order Diagrams Through Two-Dimension Extension
Order diagrams are an important tool to visualize the complex structure of
ordered sets. Favorable drawings of order diagrams, i.e., easily readable for
humans, are hard to come by, even for small ordered sets. Many attempts were
made to transfer classical graph drawing approaches to order diagrams. Although
these methods produce satisfying results for some ordered sets, they
unfortunately perform poorly in general. In this work we present the novel
algorithm DimDraw to draw order diagrams. This algorithm is based on a relation
between the dimension of an ordered set and the bipartiteness of a
corresponding graph.Comment: 16 pages, 12 Figure
Topic space trajectories: A case study on machine learning literature
The annual number of publications at scientific venues, for example, conferences and journals, is growing quickly. Hence, even for researchers it becomes harder and harder to keep track of research topics and their progress. In this task, researchers can be supported by automated publication analysis. Yet, many such methods result in uninterpretable, purely numerical representations. As an attempt to support human analysts, we present topic space trajectories, a structure that allows for the comprehensible tracking of research topics. We demonstrate how these trajectories can be interpreted based on eight different analysis approaches. To obtain comprehensible results, we employ non-negative matrix factorization as well as suitable visualization techniques. We show the applicability of our approach on a publication corpus spanning 50Â years of machine learning research from 32 publication venues. In addition to a thorough introduction of our method, our focus is on an extensive analysis of the results we achieved. Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work. An advantage in these applications over previous methods lies in the good interpretability of the results obtained through our methods
FCA2VEC: Embedding Techniques for Formal Concept Analysis
Embedding large and high dimensional data into low dimensional vector spaces
is a necessary task to computationally cope with contemporary data sets.
Superseding latent semantic analysis recent approaches like word2vec or
node2vec are well established tools in this realm. In the present paper we add
to this line of research by introducing fca2vec, a family of embedding
techniques for formal concept analysis (FCA). Our investigation contributes to
two distinct lines of research. First, we enable the application of FCA notions
to large data sets. In particular, we demonstrate how the cover relation of a
concept lattice can be retrieved from a computational feasible embedding.
Secondly, we show an enhancement for the classical node2vec approach in low
dimension. For both directions the overall constraint of FCA of explainable
results is preserved. We evaluate our novel procedures by computing fca2vec on
different data sets like, wiki44 (a dense part of the Wikidata knowledge
graph), the Mushroom data set and a publication network derived from the FCA
community.Comment: 25 page
Selecting Features by their Resilience to the Curse of Dimensionality
Real-world datasets are often of high dimension and effected by the curse of
dimensionality. This hinders their comprehensibility and interpretability. To
reduce the complexity feature selection aims to identify features that are
crucial to learn from said data. While measures of relevance and pairwise
similarities are commonly used, the curse of dimensionality is rarely
incorporated into the process of selecting features. Here we step in with a
novel method that identifies the features that allow to discriminate data
subsets of different sizes. By adapting recent work on computing intrinsic
dimensionalities, our method is able to select the features that can
discriminate data and thus weaken the curse of dimensionality. Our experiments
show that our method is competitive and commonly outperforms established
feature selection methods. Furthermore, we propose an approximation that allows
our method to scale to datasets consisting of millions of data points. Our
findings suggest that features that discriminate data and are connected to a
low intrinsic dimensionality are meaningful for learning procedures.Comment: 16 pages, 1 figure, 2 table