32 research outputs found
Information-theoretic analysis of multivariate single - cell signaling responses using SLEMI
Mathematical methods of information theory constitute essential tools to
describe how stimuli are encoded in activities of signaling effectors.
Exploring the information-theoretic perspective, however, remains conceptually,
experimentally and computationally challenging. Specifically, existing
computational tools enable efficient analysis of relatively simple systems,
usually with one input and output only. Moreover, their robust and readily
applicable implementations are missing. Here, we propose a novel algorithm to
analyze signaling data within the framework of information theory. Our approach
enables robust as well as statistically and computationally efficient analysis
of signaling systems with high-dimensional outputs and a large number of input
values. Analysis of the NF-kB single - cell signaling responses to TNF-a
uniquely reveals that the NF-kB signaling dynamics improves discrimination of
high concentrations of TNF-a with a modest impact on discrimination of low
concentrations. Our readily applicable R-package, SLEMI - statistical learning
based estimation of mutual information, allows the approach to be used by
computational biologists with only elementary knowledge of information theory
Ensemble estimation of multivariate f-divergence
f-divergence estimation is an important problem in the fields of information
theory, machine learning, and statistics. While several divergence estimators
exist, relatively few of their convergence rates are known. We derive the MSE
convergence rate for a density plug-in estimator of f-divergence. Then by
applying the theory of optimally weighted ensemble estimation, we derive a
divergence estimator with a convergence rate of O(1/T) that is simple to
implement and performs well in high dimensions. We validate our theoretical
results with experiments.Comment: 14 pages, 6 figures, a condensed version of this paper was accepted
to ISIT 2014, Version 2: Moved the proofs of the theorems from the main body
to appendices at the en
Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators
We provide finite-sample analysis of a general framework for using k-nearest
neighbor statistics to estimate functionals of a nonparametric continuous
probability density, including entropies and divergences. Rather than plugging
a consistent density estimate (which requires as the sample size
) into the functional of interest, the estimators we consider fix
k and perform a bias correction. This is more efficient computationally, and,
as we show in certain cases, statistically, leading to faster convergence
rates. Our framework unifies several previous estimators, for most of which
ours are the first finite sample guarantees.Comment: 16 pages, 0 figure
Big Variates: Visualizing and identifying key variables in a multivariate world
Big Data involves both a large number of events but also many variables. This
paper will concentrate on the challenge presented by the large number of
variables in a Big Dataset. It will start with a brief review of exploratory
data visualisation for large dimensional datasets and the use of parallel
coordinates. This motivates the use of information theoretic ideas to
understand multivariate data. Two key information-theoretic statistics
(Similarity Index and Class Distance Indicator) will be described which are
used to identify the key variables and then guide the user in a subsequent
machine learning analysis. Key to the approach is a novel algorithm to
histogram data which quantifies the information content of the data. The Class
Distance Indicator also sets a limit on the classification performance of
machine learning algorithms for the specific dataset.Comment: 16 Pages, 7 Figures. Pre-print from talk at ULITIMA 2018, Argonne
National Laboratory, 11-14 September 201
Measuring the Discrepancy between Conditional Distributions: Methods, Properties and Applications
We propose a simple yet powerful test statistic to quantify the discrepancy
between two conditional distributions. The new statistic avoids the explicit
estimation of the underlying distributions in highdimensional space and it
operates on the cone of symmetric positive semidefinite (SPS) matrix using the
Bregman matrix divergence. Moreover, it inherits the merits of the correntropy
function to explicitly incorporate high-order statistics in the data. We
present the properties of our new statistic and illustrate its connections to
prior art. We finally show the applications of our new statistic on three
different machine learning problems, namely the multi-task learning over
graphs, the concept drift detection, and the information-theoretic feature
selection, to demonstrate its utility and advantage. Code of our statistic is
available at https://bit.ly/BregmanCorrentropy.Comment: manuscript accepted at IJCAI 20; added additional notes on
computational complexity and auto-differentiable property; code is available
at https://github.com/SJYuCNEL/Bregman-Correntropy-Conditional-Divergenc
K nearest neighbor equality: giving equal chance to all existing classes
The nearest neighbor classification method assigns an unclassified point to the class of the nearest case of a set of previously classified points. This rule is independent of the underlying joint distribution of the sample points and their classifications. An extension to this approach is the k-NN method, in which the classification of the unclassified point is made by following a voting criteria within the k nearest points. The method we present here extends the k-NN idea, searching in each class for the k nearest points to the unclassified point, and classifying it in the class which minimizes the mean distance between the unclassified point and the k nearest points within each class. As all classes can take part in the final selection process, we have called the new approach k Nearest Neighbor Equality (k-NNE). Experimental results we obtained empirically show the suitability of the k-NNE algorithm, and its effectiveness suggests that it could be added to the current list of distance based classifiers.This work has been supported by the Basque Country University and by the Basque Government under the research team grant program
k-Nearest Neighbor Based Consistent Entropy Estimation for Hyperspherical Distributions
A consistent entropy estimator for hyperspherical data is proposed based on the k-nearest neighbor (knn) approach. The asymptotic unbiasedness and consistency of the estimator are proved. Moreover, cross entropy and Kullback-Leibler (KL) divergence estimators are also discussed. Simulation studies are conducted to assess the performance of the estimators for models including uniform and von Mises-Fisher distributions. The proposed knn entropy estimator is compared with the moment based counterpart via simulations. The results show that these two methods are comparable