31 research outputs found

    Closing the gap between theory and practice: educational Python notebooks

    Get PDF
    Jornada de Innovación Docente: resultados y estrategias, celebrada el 22 de junio de 2016 en la Universidad Carlos III de Madrid, donde se presentan algunos de los proyectos de innovación docente del curso 2015-2016

    Nuevos criterios de ayuda para conjuntos de decisores cooperativos

    Get PDF
    Aunque en muchas aplicaciones las Redes Neuronales (RRNN) son una herramienta poderosa, en otros problemas (complejos) una única red resulta insuficiente. Para solventar esta dificultad, se puede considerar la combinación de diferentes redes (simples) de modo que se forme un conjunto de RRNN capaz de resolver mejor el problema en cuestión, proporcionando, además, un diseño más sencillo y más fácilmente comprensible, lo que ha ocasionado que su empleo sea cada vez más frecuente. Entre los conjuntos de RRNN destacan, por sus sencillos principios conceptuales y sus contrastadas buenas prestaciones, los métodos de “Boosting”, y, especialmente, el algoritmo “AdaBoost”. En esta Tesis Doctoral se partirá del algoritmo “Real AdaBoost” (RA), cuya función de énfasis puede descomponerse en el producto de dos términos, uno relacionado con el error cuadrático de las muestras y otro asociado con la proximidad de las mismas a la frontera. Esta descomposición permite generalizar la estructura de la función de énfasis del RA, introduciendo un parámetro de mezcla ajustable, ¸, para controlar el compromiso entre los dos términos de énfasis; el empleo de esta nueva función de énfasis da lugar, como primera aportación, a un nuevo algoritmo que se denomina RA con énfasis ponderado (RA-we, “RA with weigthed emphasis”). Experimentalmente se ha comprobado que si el parámetro de mezcla se selecciona adecuadamente pueden conseguirse mejoras significativas sobre las prestaciones del RA. Sin embargo, no siempre es sencillo encontrar el valor óptimo de ¸, y una selección mediante un procedimiento de Validación Cruzada está lejos de aprovechar todo el potencial que el énfasis mixto puede aportar. Siguiendo esta línea de trabajo, en esta Tesis se exploran, además, dos alternativas para escoger el parámetro de mezcla. La primera de ellas, en lugar de intentar encontrar el mejor valor de ¸, combina las salidas de una serie de conjuntos RA-we entrenados con diferentes valores de ¸; de este modo, aprovecha la diversidad introducida por el parámetro de mezcla para construir comités de conjuntos RA-we. La segunda de las alternativas propuestas considera una versión generalizada del parámetro de separación del clasificador usado por el algoritmo RA (una correlación ponderada entre las salidas del clasificador y las correspondientes etiquetas), y propone ajustar dinámicamente el parámetro de mezcla VI durante el crecimiento del conjunto. Para ello, en cada iteración se selecciona el valor de ¸ que proporciona un mayor parámetro generalizado de separación. La idoneidad de estas propuestas es corroborada sobre un conjunto de problemas de decisión binaria, mostrando la efectividad del énfasis mixto, así como de los dos esquemas de selección de ¸: comités de conjuntos RA-we y selección dinámica de ¸. Además, la comparación de ambas propuestas con esquemas RA clásicos demuestra el interés de los nuevos algoritmos en el ámbito de la construcción de sistemas de múltiples redes. ____________________________________________Although Neural Networks (NNs) are an effective tool in many applications, a NN may be inefficient for solving (complex) tasks. To tackle this problem, we may combine a set of NNs in order to construct NN ensemble capable of solving the initial problem, providing an easier design solution and helping to interpret more clearly the resulting machine. The above reasons have increased the interest in this research area during recent years. Among NN ensembles, boosting methods, and in particular AdaBoost, are attractive because of their simple conceptual principles and their good generalization performance. In this Ph.D. Thesis, we start from the Real AdaBoost (RA) algorithm, where the emphasis function can be decomposed into the product of two factors. The first depending on the quadratic error of each sample, and the second being a function of the “proximity” of the sample to the classification border. This decomposition makes it possible to generalize the structure of the RA emphasis function by introducing an adjustable mixing parameter ¸ to control the trade-off between both emphasis terms; the algorithm resulting from this proposal is referred to as RA with weighted emphasis (RA-we). Experiments show that a significant improvement over the classical RA performance can be achieved if mixing parameter ¸ is adequately selected. However, finding the optimal ¸ is not always an easy task, and using Cross Validation selection methods does not exploit fully the potential that the mixed emphasis function can provide. Following this research line, this Dissertation also explores two alternatives for selecting the mixing parameter. Rather than trying to find the best value for ¸, the first proposal combines the outputs of a number of RA-we networks trained with different values of ¸; in this way, we take advantage of the diversity introduced by the mixing coefficient to build committees of RA-we networks. The second approach considers a generalized version of the learner edge defined by the RA algorithm (a weighted correlation between the learners output and the true labels) as an indication of the learner quality, and it proposes to dynamically adjust the mixing parameter during the ensemble growth. In order to do this, we iteratively select the value that provides the learner with the largest generalized edge. The effectiveness of these two approaches is corroborated over several benchmark biVIII nary decision problems, showing the efficacy of the mixed emphasis approach, as well as the appropriateness of both schemes for selecting ¸: (1) committees of RA-we networks, and (2) dynamic ¸ selection. Finally, we conclude that the algorithms described in this Thesis in comparison to traditional RA algorithms present interesting possibilities for building multi-net systems

    Comparison of feature representations in MRI-based MCI-to-AD conversion prediction

    Get PDF
    Alzheimer's disease (AD) is a progressive neurological disorder in which the death of brain cells causes memory loss and cognitive decline. The identification of at-risk subjects yet showing no dementia symptoms but who will later convert to AD can be crucial for the effective treatment of AD. For this, Magnetic Resonance Imaging (MRI) is expected to play a crucial role. During recent years, several Machine Learning (ML) approaches to AD-conversion prediction have been proposed using different types of MRI features. However, few studies comparing these different feature representations exist, and the existing ones do not allow to make definite conclusions. We evaluated the performance of various types of MRI features for the conversion prediction: voxel-based features extracted based on voxel-based morphometry, hippocampus volumes, volumes of the entorhinal cortex, and a set of regional volumetric, surface area, and cortical thickness measures across the brain. Regional features consistently yielded the best performance over two classifiers (Support Vector Machines and Regularized Logistic Regression), and two datasets studied. However, the performance difference to other features was not statistically significant. There was a consistent trend of age correction improving the classification performance, but the improvement reached statistical significance only rarely.Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. J. Tohka's work was supported by the Academy of Finland and V. Gómez-Verdejo's work has been partly funded by the Spanish MINECO grant TEC2014-52289R, TEC2016-81900-REDT/AEI and TEC2017-83838-R

    Regularized bagged canonical component analysis for multiclass learning in brain imaging

    Get PDF
    Alzheimer’s Disease Neuroimaging Initiative (ADNI) is a Group/Institutional Author. Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report.A fundamental problem of supervised learning algorithms for brain imaging applications is that the number of features far exceeds the number of subjects. In this paper, we propose a combined feature selection and extraction approach for multiclass problems. This method starts with a bagging procedure which calculates the sign consistency of the multivariate analysis (MVA) projection matrix feature-wise to determine the relevance of each feature. This relevance measure provides a parsimonious matrix, which is combined with a hypothesis test to automatically determine the number of selected features. Then, a novel MVA regularized with the sign and magnitude consistency of the features is used to generate a reduced set of summary components providing a compact data description. We evaluated the proposed method with two multiclass brain imaging problems: 1) the classification of the elderly subjects in four classes (cognitively normal, stable mild cognitive impairment (MCI), MCI converting to AD in 3 years, and Alzheimer’s disease) based on structural brain imaging data from the ADNI cohort; 2) the classification of children in 3 classes (typically developing, and 2 types of Attention Deficit/Hyperactivity Disorder (ADHD)) based on functional connectivity. Experimental results confirmed that each brain image (defined by 29.852 features in the ADNI database and 61.425 in the ADHD) could be represented with only 30 − 45% of the original features. Furthermore, this information could be redefined into two or three summary components, providing not only a gain of interpretability but also classification rate improvements when compared to state-of-art reference methods.C. Sevilla-Salcedo and V. Gomez-Verdejo's work has been partly funded by the Spanish MINECO grant TEC2014-52289-R and TEC2017-83838-R as well as KERMES, which is a NoE on kernel methods for structured data, funded by the Spanish Ministry of Economy and Competitiveness, TEC2016-81900-REDT ru. Jussi Tohka's work is supported by the Academy of Finland (grant 316258)

    A novel framework for parsimonious multivariate analysis

    Get PDF
    This paper proposes a framework in which a multivariate analysis method (MVA) guides a selection of input variables that leads to a sparse feature extraction. This framework, called parsimonious MVA, is specially suited for high dimensional data such as gene arrays, digital pictures, etc. The feature selection relies on the analysis of consistency in the behaviour of the input variables through the elements of an ensemble of MVA projection matrices. The ensemble is constructed following a bootstrap that builds on an efficient and generalized MVA formulation that covers PCA, CCA and OPLS. Moreover, it allows the estimation of the relative relevance of each selected input variable. Experimental results point out that the features extracted by the parsimonious MVA have excellent discrimination power, comparing favorably with state-of-the-art methods, and are potentially useful to build interpretable features. Besides, the parsimonious feature extractor is shown to be robust against to parameter selection, as we all computationally efficient.This work has been partly funded by the Spanish MINECO grant TEC2014-52289R and TEC2013-48439-C4-1-R. The authors want to thank the action editor and the reviewers for their valuable feedback

    Regularized multivariate analysis framework for interpretable high-dimensional variable selection

    Get PDF
    Multivariate Analysis (MVA) comprises a family of well-known methods for feature extraction which exploit correlations among input variables representing the data. One important property that is enjoyed by most such methods is uncorrelation among the extracted features. Recently, regularized versions of MVA methods have appeared in the literature, mainly with the goal to gain interpretability of the solution. In these cases, the solutions can no longer be obtained in a closed manner, and more complex optimization methods that rely on the iteration of two steps are frequently used. This paper recurs to an alternative approach to solve efficiently this iterative problem. The main novelty of this approach lies in preserving several properties of the original methods, most notably the uncorrelation of the extracted features. Under this framework, we propose a novel method that takes advantage of the,2,1 norm to perform variable selection during the feature extraction process. Experimental results over different problems corroborate the advantages of the proposed formulation in comparison to state of the art formulations.This work has been partly supported by MINECO projects TEC2013-48439-C4-1-R, TEC2014-52289-R and TEC2016-75161-C2-2-R, and Comunidad de Madrid projects PRICAM P2013/ICE-2933 and S2013/ICE-2933

    Nonnegative OPLS for supervised design of filter banks: application to image and audio feature extraction

    Get PDF
    Audio or visual data analysis tasks usually have to deal with high-dimensional and nonnegative signals. However, most data analysis methods suffer from overfitting and numerical problems when data have more than a few dimensions needing a dimensionality reduction preprocessing. Moreover, interpretability about how and why filters work for audio or visual applications is a desired property, especially when energy or spectral signals are involved. In these cases, due to the nature of these signals, the nonnegativity of the filter weights is a desired property to better understand its working. Because of these two necessities, we propose different methods to reduce the dimensionality of data while the nonnegativity and interpretability of the solution are assured. In particular, we propose a generalized methodology to design filter banks in a supervised way for applications dealing with nonnegative data, and we explore different ways of solving the proposed objective function consisting of a nonnegative version of the orthonormalized partial least-squares method. We analyze the discriminative power of the features obtained with the proposed methods for two different and widely studied applications: texture and music genre classification. Furthermore, we compare the filter banks achieved by our methods with other state-of-the-art methods specifically designed for feature extraction.This work was supported in parts by the MINECO projects TEC2013-48439-C4-1-R, TEC2014-52289-R, TEC2016-75161-C2-1-R, TEC2016-75161-C2-2-R, TEC2016-81900-REDT/AEI, and PRICAM (S2013/ICE-2933)

    Adaptive Sparse Gaussian Process

    Full text link
    Adaptive learning is necessary for non-stationary environments where the learning machine needs to forget past data distribution. Efficient algorithms require a compact model update to not grow in computational burden with the incoming data and with the lowest possible computational cost for online parameter updating. Existing solutions only partially cover these needs. Here, we propose the first adaptive sparse Gaussian Process (GP) able to address all these issues. We first reformulate a variational sparse GP algorithm to make it adaptive through a forgetting factor. Next, to make the model inference as simple as possible, we propose updating a single inducing point of the sparse GP model together with the remaining model parameters every time a new sample arrives. As a result, the algorithm presents a fast convergence of the inference process, which allows an efficient model update (with a single inference iteration) even in highly non-stationary environments. Experimental results demonstrate the capabilities of the proposed algorithm and its good performance in modeling the predictive posterior in mean and confidence interval estimation compared to state-of-the-art approaches
    corecore