11 research outputs found
Simple stopping criteria for information theoretic feature selection
Feature selection aims to select the smallest feature subset that yields the
minimum generalization error. In the rich literature in feature selection,
information theory-based approaches seek a subset of features such that the
mutual information between the selected features and the class labels is
maximized. Despite the simplicity of this objective, there still remain several
open problems in optimization. These include, for example, the automatic
determination of the optimal subset size (i.e., the number of features) or a
stopping criterion if the greedy searching strategy is adopted. In this paper,
we suggest two stopping criteria by just monitoring the conditional mutual
information (CMI) among groups of variables. Using the recently developed
multivariate matrix-based Renyi's \alpha-entropy functional, which can be
directly estimated from data samples, we showed that the CMI among groups of
variables can be easily computed without any decomposition or approximation,
hence making our criteria easy to implement and seamlessly integrated into any
existing information theoretic feature selection methods with a greedy search
strategy.Comment: Paper published in the journal of Entrop
The deep kernelized autoencoder
This is the author accepted manuscript. The final version is available from Elsevier via the DOI in this recordAutoencoders learn data representations (codes) in such a way that the input is reproduced at the output of the network. However, it is not always clear what kind of properties of the input data need to be captured by the codes. Kernel machines have experienced great success by operating via inner-products in a theoretically well-defined reproducing kernel Hilbert space, hence capturing topological properties of input data. In this paper, we enhance the autoencoder's ability to learn effective data representations by aligning inner products between codes with respect to a kernel matrix. By doing so, the proposed kernelized autoencoder allows learning similarity-preserving embeddings of input data, where the notion of similarity is explicitly controlled by the user and encoded in a positive semi-definite kernel matrix. Experiments are performed for evaluating both reconstruction and kernel alignment performance in classification tasks and visualization of high-dimensional data. Additionally, we show that our method is capable to emulate kernel principal component analysis on a denoising task, obtaining competitive results at a much lower computational cost.Norwegian Research Council FRIPR
Revisiting the Robustness of the Minimum Error Entropy Criterion: A Transfer Learning Case Study
Coping with distributional shifts is an important part of transfer learning
methods in order to perform well in real-life tasks. However, most of the
existing approaches in this area either focus on an ideal scenario in which the
data does not contain noises or employ a complicated training paradigm or model
design to deal with distributional shifts. In this paper, we revisit the
robustness of the minimum error entropy (MEE) criterion, a widely used
objective in statistical signal processing to deal with non-Gaussian noises,
and investigate its feasibility and usefulness in real-life transfer learning
regression tasks, where distributional shifts are common. Specifically, we put
forward a new theoretical result showing the robustness of MEE against
covariate shift. We also show that by simply replacing the mean squared error
(MSE) loss with the MEE on basic transfer learning algorithms such as
fine-tuning and linear probing, we can achieve competitive performance with
respect to state-of-the-art transfer learning algorithms. We justify our
arguments on both synthetic data and 5 real-world time-series data.Comment: Manuscript accepted at ECAI-23. Code available at
https://github.com/lpsilvestrin/mee-finetun