Skip to main content
Article thumbnail
Location of Repository

Real-valued feature selection for process approximation and prediction

By Frank Heister and Rüdiger Brause


The selection of features for classification, clustering and approximation is an important task in pattern recognition, data mining and soft computing. For real-valued features, this contribution shows how feature selection for a high number of features can be implemented using mutual in-formation. Especially, the common problem for mutual information computation of computing joint probabilities for many dimensions using only a few samples is treated by using the Rènyi mutual information of order two as computational base. For this, the Grassberger-Takens corre-lation integral is used which was developed for estimating probability densities in chaos theory. Additionally, an adaptive procedure for computing the hypercube size is introduced and for real world applications, the treatment of missing values is included. The computation procedure is accelerated by exploiting the ranking of the set of real feature values especially for the example of time series. As example, a small blackbox-glassbox example shows how the relevant features and their time lags are determined in the time series even if the input feature time series determine nonlinearly the output. A more realistic example from chemical industry shows that this enables a better ap-proximation of the input-output mapping than the best neural network approach developed for an international contest. By the computationally efficient implementation, mutual information becomes an attractive tool for feature selection even for a high number of real-valued features

Topics: ddc:004
Year: 2009
OAI identifier:

Suggested articles


  1. B.: A more efficient branch and bound algorithm for feature selection,
  2. (1991). Elements of Information Theory.
  3. (2000). Fast branch and bound algorithm in feature selection.
  4. (1997). Feature Selection for Classification". Intelligent Data Analysis -
  5. (1994). Floating search methods in feature selection,
  6. (1986). Independent coordinates for strange attractors from mutual information,
  7. (2000). Information theoretic learning.
  8. Invariants related to dimension and entropy. In: Atas do 13. Coloquio Brasileiro de Mathematica,
  9. (2003). Koutroumbas K.: Pattern Recognition,
  10. (1993). Measuring statistical dependencies in a time series.
  11. (2000). Mutual Information in Learning Feature Transformations,
  12. Mutual Information Software:
  13. (2001). Non-linear time series analysis of combustion pressure data for neural network training with the concept of mutual information,
  14. (2008). Nonlinear feature selection using the general mutual information, Johann Wolfgang Goethe-Universität,
  15. (1995). On the Concept of the Generalized Mutual Information Function and Efficient Algorithms for Calculating It, unpublished
  16. (1962). On the estimation of probability density function and the mode.
  17. (1982). Pattern recognition and reduction of dimensionality.
  18. (1970). Probability Theory, Noth-Holland,
  19. (1983). Procaccia I.: Estimation of the Kolmogorov entropy from a chaotic signal.
  20. Schwerpunkt-Kriterium und automatische Klingelerkennung,
  21. (1996). Selecting Input Variables Using Mutual Information and Nonparametric Density
  22. (2005). Sensitivity Analysis for Chemical Models,
  23. (2000). statistical Data-Analysis for the optimal Construction of Neural-Network Inputs with the Concept of a Mutual Information, In:
  24. (1949). The mathematical theory of information.
  25. The Proof and Measurement of Association Between Two Things,
  26. (1994). Using mutual information for selecting features in supervised neural net learning,

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.