435 research outputs found
Efficient Information Theoretic Clustering on Discrete Lattices
We consider the problem of clustering data that reside on discrete, low
dimensional lattices. Canonical examples for this setting are found in image
segmentation and key point extraction. Our solution is based on a recent
approach to information theoretic clustering where clusters result from an
iterative procedure that minimizes a divergence measure. We replace costly
processing steps in the original algorithm by means of convolutions. These
allow for highly efficient implementations and thus significantly reduce
runtime. This paper therefore bridges a gap between machine learning and signal
processing.Comment: This paper has been presented at the workshop LWA 201
Bayesian Logic Programs
Bayesian networks provide an elegant formalism for representing and reasoning
about uncertainty using probability theory. Theyare a probabilistic extension
of propositional logic and, hence, inherit some of the limitations of
propositional logic, such as the difficulties to represent objects and
relations. We introduce a generalization of Bayesian networks, called Bayesian
logic programs, to overcome these limitations. In order to represent objects
and relations it combines Bayesian networks with definite clause logic by
establishing a one-to-one mapping between ground atoms and random variables. We
show that Bayesian logic programs combine the advantages of both definite
clause logic and Bayesian networks. This includes the separation of
quantitative and qualitative aspects of the model. Furthermore, Bayesian logic
programs generalize both Bayesian networks as well as logic programs. So, many
ideas developedComment: 52 page
How is a data-driven approach better than random choice in label space division for multi-label classification?
We propose using five data-driven community detection approaches from social
networks to partition the label space for the task of multi-label
classification as an alternative to random partitioning into equal subsets as
performed by RAkELd: modularity-maximizing fastgreedy and leading eigenvector,
infomap, walktrap and label propagation algorithms. We construct a label
co-occurence graph (both weighted an unweighted versions) based on training
data and perform community detection to partition the label set. We include
Binary Relevance and Label Powerset classification methods for comparison. We
use gini-index based Decision Trees as the base classifier. We compare educated
approaches to label space divisions against random baselines on 12 benchmark
data sets over five evaluation measures. We show that in almost all cases seven
educated guess approaches are more likely to outperform RAkELd than otherwise
in all measures, but Hamming Loss. We show that fastgreedy and walktrap
community detection methods on weighted label co-occurence graphs are 85-92%
more likely to yield better F1 scores than random partitioning. Infomap on the
unweighted label co-occurence graphs is on average 90% of the times better than
random paritioning in terms of Subset Accuracy and 89% when it comes to Jaccard
similarity. Weighted fastgreedy is better on average than RAkELd when it comes
to Hamming Loss
Maximum Entropy Models of Shortest Path and Outbreak Distributions in Networks
Properties of networks are often characterized in terms of features such as
node degree distributions, average path lengths, diameters, or clustering
coefficients. Here, we study shortest path length distributions. On the one
hand, average as well as maximum distances can be determined therefrom; on the
other hand, they are closely related to the dynamics of network spreading
processes. Because of the combinatorial nature of networks, we apply maximum
entropy arguments to derive a general, physically plausible model. In
particular, we establish the generalized Gamma distribution as a continuous
characterization of shortest path length histograms of networks or arbitrary
topology. Experimental evaluations corroborate our theoretical results
A Revised Publication Model for ECML PKDD
ECML PKDD is the main European conference on machine learning and data
mining. Since its foundation it implemented the publication model common in
computer science: there was one conference deadline; conference submissions
were reviewed by a program committee; papers were accepted with a low
acceptance rate. Proceedings were published in several Springer Lecture Notes
in Artificial (LNAI) volumes, while selected papers were invited to special
issues of the Machine Learning and Data Mining and Knowledge Discovery
journals. In recent years, this model has however come under stress. Problems
include: reviews are of highly variable quality; the purpose of bringing the
community together is lost; reviewing workloads are high; the information
content of conferences and journals decreases; there is confusion among
scientists in interdisciplinary contexts. In this paper, we present a new
publication model, which will be adopted for the ECML PKDD 2013 conference, and
aims to solve some of the problems of the traditional model. The key feature of
this model is the creation of a journal track, which is open to submissions all
year long and allows for revision cycles.Comment: 13 page
- …