211 research outputs found

    Functional Regression

    Full text link
    Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively. This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development. First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression. For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods. The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed. At the end is a brief discussion describing potential areas of future development in this field

    Discriminative variable selection for clustering with the sparse Fisher-EM algorithm

    Full text link
    The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results. Existing approaches have demonstrated the efficiency of variable selection for clustering but turn out to be either very time consuming or not sparse enough in high-dimensional spaces. This work proposes to perform a selection of the discriminative variables by introducing sparsity in the loading matrix of the Fisher-EM algorithm. This clustering method has been recently proposed for the simultaneous visualization and clustering of high-dimensional data. It is based on a latent mixture model which fits the data into a low-dimensional discriminative subspace. Three different approaches are proposed in this work to introduce sparsity in the orientation matrix of the discriminative subspace through â„“1\ell_{1}-type penalizations. Experimental comparisons with existing approaches on simulated and real-world data sets demonstrate the interest of the proposed methodology. An application to the segmentation of hyperspectral images of the planet Mars is also presented

    Simultaneous model-based clustering and visualization in the Fisher discriminative subspace

    Full text link
    Clustering in high-dimensional spaces is nowadays a recurrent problem in many scientific domains but remains a difficult task from both the clustering accuracy and the result understanding points of view. This paper presents a discriminative latent mixture (DLM) model which fits the data in a latent orthonormal discriminative subspace with an intrinsic dimension lower than the dimension of the original space. By constraining model parameters within and between groups, a family of 12 parsimonious DLM models is exhibited which allows to fit onto various situations. An estimation algorithm, called the Fisher-EM algorithm, is also proposed for estimating both the mixture parameters and the discriminative subspace. Experiments on simulated and real datasets show that the proposed approach performs better than existing clustering methods while providing a useful representation of the clustered data. The method is as well applied to the clustering of mass spectrometry data

    Speech Recognition Using Augmented Conditional Random Fields

    Get PDF
    Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ability to model spectral phenomena well. In this paper, a new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed. This paradigm addresses some limitations of HMMs while maintaining many of the aspects which have made them successful. In particular, the acoustic modeling problem is reformulated in a data driven, sparse, augmented space to increase discrimination. Acoustic context modeling is explicitly integrated to handle the sequential phenomena of the speech signal. We present an efficient framework for estimating these models that ensures scalability and generality. In the TIMIT phone recognition task, a phone error rate of 23.0\% was recorded on the full test set, a significant improvement over comparable HMM-based systems

    Revisiting probabilistic neural networks: a comparative study with support vector machines and the microhabitat suitability for the Eastern Iberian chub (Squalius valentinus)

    Full text link
    [EN] Probabilistic Neural Networks (PNNs) and Support Vector Machines (SVMs) are flexible classification techniques suited to render trustworthy species distribution and habitat suitability models. Although several alternatives to improve PNNs¿ reliability and performance and/or to reduce computational costs exist, PNNs are currently not well recognised as SVMs because the SVMs were compared with standard PNNs. To rule out this idea, the microhabitat suitability for the Eastern Iberian chub (Squalius valentinus Doadrio & Carmona, 2006) was modelled with SVMs and four types of PNNs (homoscedastic, heteroscedastic, cluster and enhanced PNNs); all of them optimised with differential evolution. The fitness function and several performance criteria (correctly classified instances, true skill statistic, specificity and sensitivity) and partial dependence plots were used to assess respectively the performance and reliability of each habitat suitability model. Heteroscedastic and enhanced PNNs achieved the highest performance in every index but specificity. However, these two PNNs rendered ecologically unreliable partial dependence plots. Conversely, homoscedastic and cluster PNNs rendered ecologically reliable partial dependence plots. Thus, Eastern Iberian chub proved to be a eurytopic species, presenting the highest suitability in microhabitats with cover present, low flow velocity (approx. 0.3 m/s), intermediate depth (approx. 0.6 m) and fine gravel (64¿256 mm). PNNs outperformed SVMs; thus, based on the results of the cluster PNN, which also showed high values of the performance criteria, we would advocate a combination of approaches (e.g., cluster & heteroscedastic or cluster & enhanced PNNs) to balance the trade-off between accuracy and reliability of habitat suitability models.The study has been partially funded by the national Research project IMPADAPT (CGL2013-48424-C2-1-R) with MINECO (Spanish Ministry of Economy) and Feder funds and by the Confederacion Hidrografica del Near (Spanish Ministry of Agriculture and Fisheries, Food and Environment). This study was also supported in part by the University Research Administration Center of the Tokyo University of Agriculture and Technology. Thanks to Maria Jose Felipe for reviewing the mathematical notation and to the two anonymous reviewers who helped to improve the manuscript.Muñoz Mas, R.; Fukuda, S.; Portolés, J.; Martinez-Capel, F. (2018). Revisiting probabilistic neural networks: a comparative study with support vector machines and the microhabitat suitability for the Eastern Iberian chub (Squalius valentinus). Ecological Informatics. 43:24-37. https://doi.org/10.1016/J.ECOINF.2017.10.008S24374

    ON THE CONSISTENCY AND ROBUSTNESS PROPERTIES OF LINEAR DISCRIMINANT ANALYSIS

    Get PDF
    Strong consistency of linear discriminant analysis is established under wide assumptions on the class conditional densities. Robustness to the presence of a mild degree of class dispersion heterogeneity is also analyzed. Results obtained may help to explain analytically the frequent good behavior in applications of linear discrimination techniques.

    PARAMETRIC LINK MODELS FOR KNOWLEDGE TRANSFER IN STATISTICAL LEARNING

    Get PDF
    International audienceWhen a statistical model is designed in a prediction purpose, a major assumption is the absence of evolution in the modeled phenomenon between the training and the prediction stages. Thus, training and future data must be in the same feature space and must have the same distribution. Unfortunately, this assumption turns out to be often false in real-world applications. For instance, biological motivations could lead to classify individuals from a given species when only individuals from another species are available for training. In regression, we would sometimes use a predictive model for data having not exactly the same distribution that the training data used for estimating the model. This chapter presents techniques for transfering a statistical model estimated from a source population to a target population. Three tasks of statistical learning are considered: Probabilistic classification (parametric and semi-parametric), linear regression (includingmixture of regressions) and model-based clustering (Gaussian and Student). In each situation, the knowledge transfer is carried out by introducing parametric links between both populations. The use of such transfer techniques would improve the performance of learning by avoiding much expensive data labeling efforts
    • …
    corecore