1,235 research outputs found
The density connectivity information bottleneck
Clustering with the agglomerative Information Bottleneck (aIB) algorithm suffers from the sub-optimality problem, which cannot guarantee to preserve as much relative information as possible. To handle this problem, we introduce a density connectivity chain, by which we consider not only the information between two data elements, but also the information among the neighbors of a data element. Based on this idea, we propose DCIB, a Density Connectivity Information Bottleneck algorithm that applies the Information Bottleneck method to quantify the relative information during the clustering procedure. As a hierarchical algorithm, the DCIB algorithm produces a pruned clustering tree-structure and gets clustering results in different sizes in a single execution. The experiment results in the documentation clustering indicate that the DCIB algorithm can preserve more relative information and achieve higher precision than the aIB algorithm.<br /
COMBINATION OF AGGLOMERATIVE AND SEQUENTIAL CLUSTERING FOR SPEAKER DIARIZATION
This paper aims at investigating the use of sequential clustering for speaker diarization. Conventional diarization systems are based on parametric models and agglomerative clustering. In our previous work we proposed a non-parametric method based on the agglomerative Information Bottleneck for very fast diarization. Here we consider the combination of sequential and agglomerative clustering for avoiding local maxima of the objective function and for purification. Experiments are run on the RT06 eval data. Sequential Clustering with oracle model selection can reduce the speaker error by w.r.t. agglomerative clustering. When the model selection is based on Normalized Mutual Information criterion, a relative improvement of is obtained using a combination of agglomerative and sequential clustering
An information theoretic approach to the functional classification of neurons
A population of neurons typically exhibits a broad diversity of responses to
sensory inputs. The intuitive notion of functional classification is that cells
can be clustered so that most of the diversity is captured in the identity of
the clusters rather than by individuals within clusters. We show how this
intuition can be made precise using information theory, without any need to
introduce a metric on the space of stimuli or responses. Applied to the retinal
ganglion cells of the salamander, this approach recovers classical results, but
also provides clear evidence for subclasses beyond those identified previously.
Further, we find that each of the ganglion cells is functionally unique, and
that even within the same subclass only a few spikes are needed to reliably
distinguish between cells.Comment: 13 pages, 4 figures. To appear in Advances in Neural Information
Processing Systems (NIPS) 1
Machine learning of hierarchical clustering to segment 2D and 3D images
We aim to improve segmentation through the use of machine learning tools
during region agglomeration. We propose an active learning approach for
performing hierarchical agglomerative segmentation from superpixels. Our method
combines multiple features at all scales of the agglomerative process, works
for data with an arbitrary number of dimensions, and scales to very large
datasets. We advocate the use of variation of information to measure
segmentation accuracy, particularly in 3D electron microscopy (EM) images of
neural tissue, and using this metric demonstrate an improvement over competing
algorithms in EM and natural images.Comment: 15 pages, 8 figure
Privacy-Constrained Remote Source Coding
We consider the problem of revealing/sharing data in an efficient and secure
way via a compact representation. The representation should ensure reliable
reconstruction of the desired features/attributes while still preserve privacy
of the secret parts of the data. The problem is formulated as a remote lossy
source coding with a privacy constraint where the remote source consists of
public and secret parts. Inner and outer bounds for the optimal tradeoff region
of compression rate, distortion, and privacy leakage rate are given and shown
to coincide for some special cases. When specializing the distortion measure to
a logarithmic loss function, the resulting rate-distortion-leakage tradeoff for
the case of identical side information forms an optimization problem which
corresponds to the "secure" version of the so-called information bottleneck.Comment: 10 pages, 1 figure, to be presented at ISIT 201
- …