214 research outputs found
PADDLE: Proximal Algorithm for Dual Dictionaries LEarning
Recently, considerable research efforts have been devoted to the design of
methods to learn from data overcomplete dictionaries for sparse coding.
However, learned dictionaries require the solution of an optimization problem
for coding new data. In order to overcome this drawback, we propose an
algorithm aimed at learning both a dictionary and its dual: a linear mapping
directly performing the coding. By leveraging on proximal methods, our
algorithm jointly minimizes the reconstruction error of the dictionary and the
coding error of its dual; the sparsity of the representation is induced by an
-based penalty on its coefficients. The results obtained on synthetic
data and real images show that the algorithm is capable of recovering the
expected dictionaries. Furthermore, on a benchmark dataset, we show that the
image features obtained from the dual matrix yield state-of-the-art
classification performance while being much less computational intensive
A Regularized Method for Selecting Nested Groups of Relevant Genes from Microarray Data
Gene expression analysis aims at identifying the genes able to accurately
predict biological parameters like, for example, disease subtyping or
progression. While accurate prediction can be achieved by means of many
different techniques, gene identification, due to gene correlation and the
limited number of available samples, is a much more elusive problem. Small
changes in the expression values often produce different gene lists, and
solutions which are both sparse and stable are difficult to obtain. We propose
a two-stage regularization method able to learn linear models characterized by
a high prediction performance. By varying a suitable parameter these linear
models allow to trade sparsity for the inclusion of correlated genes and to
produce gene lists which are almost perfectly nested. Experimental results on
synthetic and microarray data confirm the interesting properties of the
proposed method and its potential as a starting point for further biological
investigationsComment: 17 pages, 8 Post-script figure
Properties of Support Vector Machines
Support Vector Machines (SVMs) perform pattern recognition between two point classes by finding a decision surface determined by certain points of the training set, termed Support Vectors (SV). This surface, which in some feature space of possibly infinite dimension can be regarded as a hyperplane, is obtained from the solution of a problem of quadratic programming that depends on a regularization parameter. In this paper we study some mathematical properties of support vectors and show that the decision surface can be written as the sum of two orthogonal terms, the first depending only on the margin vectors (which are SVs lying on the margin), the second proportional to the regularization parameter. For almost all values of the parameter, this enables us to predict how the decision surface varies for small parameter changes. In the special but important case of feature space of finite dimension m, we also show that there are at most m+1 margin vectors and observe that m+1 SVs are usually sufficient to fully determine the decision surface. For relatively small m this latter result leads to a consistent reduction of the SV number
A Note on Support Vector Machines Degeneracy
When training Support Vector Machines (SVMs) over non-separable data sets, one sets the threshold using any dual cost coefficient that is strictly between the bounds of and . We show that there exist SVM training problems with dual optimal solutions with all coefficients at bounds, but that all such problems are degenerate in the sense that the "optimal separating hyperplane" is given by , and the resulting (degenerate) SVM will classify all future points identically (to the class that supplies more training data). We also derive necessary and sufficient conditions on the input data for this to occur. Finally, we show that an SVM training problem can always be made degenerate by the addition of a single data point belonging to a certain unboundedspolyhedron, which we characterize in terms of its extreme points and rays
The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection
Machine Learning (ML) has emerged as a promising approach in healthcare,
outperforming traditional statistical techniques. However, to establish ML as a
reliable tool in clinical practice, adherence to best practices regarding data
handling, experimental design, and model evaluation is crucial. This work
summarizes and strictly observes such practices to ensure reproducible and
reliable ML. Specifically, we focus on Alzheimer's Disease (AD) detection,
which serves as a paradigmatic example of challenging problem in healthcare. We
investigate the impact of different data augmentation techniques and model
complexity on the overall performance. We consider MRI data from ADNI dataset
to address a classification problem employing 3D Convolutional Neural Network
(CNN). The experiments are designed to compensate for data scarcity and initial
random parameters by utilizing cross-validation and multiple training trials.
Within this framework, we train 15 predictive models, considering three
different data augmentation strategies and five distinct 3D CNN architectures,
each varying in the number of convolutional layers. Specifically, the
augmentation strategies are based on affine transformations, such as zoom,
shift, and rotation, applied concurrently or separately. The combined effect of
data augmentation and model complexity leads to a variation in prediction
performance up to 10% of accuracy. When affine transformation are applied
separately, the model is more accurate, independently from the adopted
architecture. For all strategies, the model accuracy followed a concave
behavior at increasing number of convolutional layers, peaking at an
intermediate value of layers. The best model (8 CL, (B)) is the most stable
across cross-validation folds and training trials, reaching excellent
performance both on the testing set and on an external test set
Regularized Kernel Algorithms for Support Estimation
In the framework of non-parametric support estimation, we study the statistical properties of a set estimator defined by means of Kernel Principal Component Analysis. Under a suitable assumption on the kernel, we prove that the algorithm is strongly consistent with respect to the Hausdorff distance. We also extend the above analysis to a larger class of set estimators defined in terms of a low-pass filter function. We finally provide numerical simulations on synthetic data to highlight the role of the hyper parameters, which affect the algorithm
Multi-Output Learning via Spectral Filtering
In this paper we study a class of regularized kernel methods for vector-valued learning which are based on filtering the spectrum of the kernel matrix. The considered methods include Tikhonov regularization as a special case, as well as interesting alternatives such as vector-valued extensions of L2 boosting. Computational properties are discussed for various examples of kernels for vector-valued functions and the benefits of iterative techniques are illustrated. Generalizing previous results for the scalar case, we show finite sample bounds for the excess risk of the obtained estimator and, in turn, these results allow to prove consistency both for regression and multi-category classification. Finally, we present some promising results of the proposed algorithms on artificial and real data
Nonparametric Sparsity and Regularization
In this work we are interested in the problems of supervised learning and variable selection when the input-output dependence is described by a nonlinear function depending on a few variables. Our goal is to consider a sparse nonparametric model, hence avoiding linear or additive models. The key idea is to measure the importance of each variable in the model by making use of partial derivatives. Based on this intuition we propose and study a new regularizer and a corresponding least squares regularization scheme. Using concepts and results from the theory of reproducing kernel Hilbert spaces and proximal methods, we show that the proposed learning algorithm corresponds to a minimization problem which can be provably solved by an iterative procedure. The consistency properties of the obtained estimator are studied both in terms of prediction and selection performance. An extensive empirical analysis shows that the proposed method performs favorably with respect to the state-of-the-art
- …