865 research outputs found
Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators
We provide finite-sample analysis of a general framework for using k-nearest
neighbor statistics to estimate functionals of a nonparametric continuous
probability density, including entropies and divergences. Rather than plugging
a consistent density estimate (which requires as the sample size
) into the functional of interest, the estimators we consider fix
k and perform a bias correction. This is more efficient computationally, and,
as we show in certain cases, statistically, leading to faster convergence
rates. Our framework unifies several previous estimators, for most of which
ours are the first finite sample guarantees.Comment: 16 pages, 0 figure
Ensemble estimation of multivariate f-divergence
f-divergence estimation is an important problem in the fields of information
theory, machine learning, and statistics. While several divergence estimators
exist, relatively few of their convergence rates are known. We derive the MSE
convergence rate for a density plug-in estimator of f-divergence. Then by
applying the theory of optimally weighted ensemble estimation, we derive a
divergence estimator with a convergence rate of O(1/T) that is simple to
implement and performs well in high dimensions. We validate our theoretical
results with experiments.Comment: 14 pages, 6 figures, a condensed version of this paper was accepted
to ISIT 2014, Version 2: Moved the proofs of the theorems from the main body
to appendices at the en
Real-valued feature selection for process approximation and prediction
The selection of features for classification, clustering and approximation is an important task in pattern recognition, data mining and soft computing. For real-valued features, this contribution shows how feature selection for a high number of features can be implemented using mutual in-formation. Especially, the common problem for mutual information computation of computing joint probabilities for many dimensions using only a few samples is treated by using the Rènyi mutual information of order two as computational base. For this, the Grassberger-Takens corre-lation integral is used which was developed for estimating probability densities in chaos theory. Additionally, an adaptive procedure for computing the hypercube size is introduced and for real world applications, the treatment of missing values is included. The computation procedure is accelerated by exploiting the ranking of the set of real feature values especially for the example of time series. As example, a small blackbox-glassbox example shows how the relevant features and their time lags are determined in the time series even if the input feature time series determine nonlinearly the output. A more realistic example from chemical industry shows that this enables a better ap-proximation of the input-output mapping than the best neural network approach developed for an international contest. By the computationally efficient implementation, mutual information becomes an attractive tool for feature selection even for a high number of real-valued features
Characterization of complex networks: A survey of measurements
Each complex network (or class of networks) presents specific topological
features which characterize its connectivity and highly influence the dynamics
of processes executed on the network. The analysis, discrimination, and
synthesis of complex networks therefore rely on the use of measurements capable
of expressing the most relevant topological features. This article presents a
survey of such measurements. It includes general considerations about complex
network characterization, a brief review of the principal models, and the
presentation of the main existing measurements. Important related issues
covered in this work comprise the representation of the evolution of complex
networks in terms of trajectories in several measurement spaces, the analysis
of the correlations between some of the most traditional measurements,
perturbation analysis, as well as the use of multivariate statistics for
feature selection and network classification. Depending on the network and the
analysis task one has in mind, a specific set of features may be chosen. It is
hoped that the present survey will help the proper application and
interpretation of measurements.Comment: A working manuscript with 78 pages, 32 figures. Suggestions of
measurements for inclusion are welcomed by the author
- …