23,872 research outputs found
A comment on "A fast L_p spike alignment metric" by A. J. Dubbs, B. A. Seiler and M. O. Magnasco [arXiv:0907.3137]
Measuring the transmitted information in metric-based clustering has become
something of a standard test for the performance of a spike train metric. In
this comment, the recently proposed L_p Victor-Purpura metric is used to
cluster spiking responses to zebra finch songs, recorded from field L of
anesthetized zebra finch. It is found that for these data the L_p metrics with
p>1 modestly outperform the standard, p=1, Victor-Purpura metric. It is argued
that this is because for larger values of p, the metric comes closer to
performing windowed coincidence detection.Comment: 9 pages, 3 figures included as late
Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies
Existing sequence alignment algorithms use heuristic scoring schemes which
cannot be used as objective distance metrics. Therefore one relies on measures
like the p- or log-det distances, or makes explicit, and often simplistic,
assumptions about sequence evolution. Information theory provides an
alternative, in the form of mutual information (MI) which is, in principle, an
objective and model independent similarity measure. MI can be estimated by
concatenating and zipping sequences, yielding thereby the "normalized
compression distance". So far this has produced promising results, but with
uncontrolled errors. We describe a simple approach to get robust estimates of
MI from global pairwise alignments. Using standard alignment algorithms, this
gives for animal mitochondrial DNA estimates that are strikingly close to
estimates obtained from the alignment free methods mentioned above. Our main
result uses algorithmic (Kolmogorov) information theory, but we show that
similar results can also be obtained from Shannon theory. Due to the fact that
it is not additive, normalized compression distance is not an optimal metric
for phylogenetics, but we propose a simple modification that overcomes the
issue of additivity. We test several versions of our MI based distance measures
on a large number of randomly chosen quartets and demonstrate that they all
perform better than traditional measures like the Kimura or log-det (resp.
paralinear) distances. Even a simplified version based on single letter Shannon
entropies, which can be easily incorporated in existing software packages, gave
superior results throughout the entire animal kingdom. But we see the main
virtue of our approach in a more general way. For example, it can also help to
judge the relative merits of different alignment algorithms, by estimating the
significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia
- …