Search CORE

431 research outputs found

Efficient Information Theoretic Clustering on Discrete Lattices

Author: Bauckhage Christian
Kersting Kristian
Publication venue
Publication date: 26/10/2013
Field of study

We consider the problem of clustering data that reside on discrete, low dimensional lattices. Canonical examples for this setting are found in image segmentation and key point extraction. Our solution is based on a recent approach to information theoretic clustering where clusters result from an iterative procedure that minimizes a divergence measure. We replace costly processing steps in the original algorithm by means of convolutions. These allow for highly efficient implementations and thus significantly reduce runtime. This paper therefore bridges a gap between machine learning and signal processing.Comment: This paper has been presented at the workshop LWA 201

arXiv.org e-Print Archive

CiteSeerX

Fraunhofer-ePrints

Bayesian Logic Programs

Author: De Raedt Luc
Kersting Kristian
Publication venue
Publication date: 01/01/2000
Field of study

Bayesian networks provide an elegant formalism for representing and reasoning about uncertainty using probability theory. Theyare a probabilistic extension of propositional logic and, hence, inherit some of the limitations of propositional logic, such as the difficulties to represent objects and relations. We introduce a generalization of Bayesian networks, called Bayesian logic programs, to overcome these limitations. In order to represent objects and relations it combines Bayesian networks with definite clause logic by establishing a one-to-one mapping between ground atoms and random variables. We show that Bayesian logic programs combine the advantages of both definite clause logic and Bayesian networks. This includes the separation of quantitative and qualitative aspects of the model. Furthermore, Bayesian logic programs generalize both Bayesian networks as well as logic programs. So, many ideas developedComment: 52 page

arXiv.org e-Print Archive

CiteSeerX

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

How is a data-driven approach better than random choice in label space division for multi-label classification?

Author: Kajdanowicz Tomasz
Kersting Kristian
Szymański Piotr
Publication venue: 'MDPI AG'
Publication date: 07/06/2016
Field of study

We propose using five data-driven community detection approaches from social networks to partition the label space for the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd: modularity-maximizing fastgreedy and leading eigenvector, infomap, walktrap and label propagation algorithms. We construct a label co-occurence graph (both weighted an unweighted versions) based on training data and perform community detection to partition the label set. We include Binary Relevance and Label Powerset classification methods for comparison. We use gini-index based Decision Trees as the base classifier. We compare educated approaches to label space divisions against random baselines on 12 benchmark data sets over five evaluation measures. We show that in almost all cases seven educated guess approaches are more likely to outperform RAkELd than otherwise in all measures, but Hamming Loss. We show that fastgreedy and walktrap community detection methods on weighted label co-occurence graphs are 85-92% more likely to yield better F1 scores than random partitioning. Infomap on the unweighted label co-occurence graphs is on average 90% of the times better than random paritioning in terms of Subset Accuracy and 89% when it comes to Jaccard similarity. Weighted fastgreedy is better on average than RAkELd when it comes to Hamming Loss

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Maximum Entropy Models of Shortest Path and Outbreak Distributions in Networks

Author: Bauckhage Christian
Hadiji Fabian
Kersting Kristian
Publication venue
Publication date: 17/01/2015
Field of study

Properties of networks are often characterized in terms of features such as node degree distributions, average path lengths, diameters, or clustering coefficients. Here, we study shortest path length distributions. On the one hand, average as well as maximum distances can be determined therefrom; on the other hand, they are closely related to the dynamics of network spreading processes. Because of the combinatorial nature of networks, we apply maximum entropy arguments to derive a general, physically plausible model. In particular, we establish the generalized Gamma distribution as a continuous characterization of shortest path length histograms of networks or arbitrary topology. Experimental evaluations corroborate our theoretical results

arXiv.org e-Print Archive

Fraunhofer-ePrints

A Revised Publication Model for ECML PKDD

Author: Blockeel Hendrik
Kersting Kristian
Nijssen Siegfried
Zelezny Filip
Publication venue
Publication date: 01/01/2012
Field of study

ECML PKDD is the main European conference on machine learning and data mining. Since its foundation it implemented the publication model common in computer science: there was one conference deadline; conference submissions were reviewed by a program committee; papers were accepted with a low acceptance rate. Proceedings were published in several Springer Lecture Notes in Artificial (LNAI) volumes, while selected papers were invited to special issues of the Machine Learning and Data Mining and Knowledge Discovery journals. In recent years, this model has however come under stress. Problems include: reviews are of highly variable quality; the purpose of bringing the community together is lost; reviewing workloads are high; the information content of conferences and journals decreases; there is confusion among scientists in interdisciplinary contexts. In this paper, we present a new publication model, which will be adopted for the ECML PKDD 2013 conference, and aims to solve some of the problems of the traditional model. The key feature of this model is the creation of a journal track, which is open to submissions all year long and allows for revision cycles.Comment: 13 page

arXiv.org e-Print Archive

CiteSeerX

DIAL UCLouvain