66,739 research outputs found

    Statistical properties of kernel principal component analysis

    Full text link

    Regularized Kernel Algorithms for Support Estimation

    Get PDF
    In the framework of non-parametric support estimation, we study the statistical properties of a set estimator defined by means of Kernel Principal Component Analysis. Under a suitable assumption on the kernel, we prove that the algorithm is strongly consistent with respect to the Hausdorff distance. We also extend the above analysis to a larger class of set estimators defined in terms of a low-pass filter function. We finally provide numerical simulations on synthetic data to highlight the role of the hyper parameters, which affect the algorithm

    A novel methodology to create generative statistical models of interconnects

    Get PDF
    This paper addresses the problem of constructing a generative statistical model for an interconnect starting from a limited set of S-parameter samples, which are obtained by simulating or measuring the interconnect for a few random realizations of its stochastic physical properties. These original samples are first converted into a pole-residue representation with common poles. The corresponding residues are modeled as a correlated stochastic process by means of principal component analysis and kernel density estimation. The obtained model allows generating new samples with similar statistics as the original data. A passivity check is performed over the generated samples to retain only passive data. The proposed approach is applied to a representative coupled microstrip line example

    Model Selection Techniques for Kernel-Based Regression Analysis Using Information Complexity Measure and Genetic Algorithms

    Get PDF
    In statistical modeling, an overparameterized model leads to poor generalization on unseen data points. This issue requires a model selection technique that appropriately chooses the form, the parameters of the proposed model and the independent variables retained for the modeling. Model selection is particularly important for linear and nonlinear statistical models, which can be easily overfitted. Recently, support vector machines (SVMs), also known as kernel-based methods, have drawn much attention as the next generation of nonlinear modeling techniques. The model selection issues for SVMs include the selection of the kernel, the corresponding parameters and the optimal subset of independent variables. In the current literature, k-fold cross-validation is the widely utilized model selection method for SVMs by the machine learning researchers. However, cross-validation is computationally intensive since one has to fit the model k times. This dissertation introduces the use of a model selection criterion based on information complexity (ICOMP) measure for kernel-based regression analysis and its applications. ICOMP penalizes both the lack-of-fit and the complexity of the model to choose the optimal model with good generalization properties. ICOMP provides a simple index for each model and does not require any validation data. It is computationally efficient and it has been successfully applied to various linear model selection problems. In this dissertation, we introduce ICOMP to the nonlinear kernel-based modeling areas. Specifically, this dissertation proposes ICOMP and its various forms in the area of kernel ridge regression; kernel partial least squares regression; kernel principal component analysis; kernel principal component regression; relevance vector regression; relevance vector logistic regression and classification problems. The model selection tasks achieved by our proposed criterion include choosing the form of the kernel function, the parameters of the kernel function, the ridge parameter, the number of latent variables, the number of principal components and the optimal subset of input variables in a simultaneous fashion for intelligent data mining. The performance of the proposed model selection method is tested on simulation bench- mark data sets as well as real data sets. The predictive performance of the proposed model selection criteria are comparable to and even better than cross-validation, which is too costly to compute and not efficient. This dissertation combines the Genetic Algorithm with ICOMP in variable subsetting, which significantly decreases the computational time as compared to the exhaustive search of all possible subsets. GA procedure is shown to be robust and performs well in our repeated simulation examples. Therefore, this dissertation provides researchers an alternative computationally efficient model selection approach for data analysis using kernel methods

    Assessing extrema of empirical principal component functions

    Full text link
    The difficulties of estimating and representing the distributions of functional data mean that principal component methods play a substantially greater role in functional data analysis than in more conventional finite-dimensional settings. Local maxima and minima in principal component functions are of direct importance; they indicate places in the domain of a random function where influence on the function value tends to be relatively strong but of opposite sign. We explore statistical properties of the relationship between extrema of empirical principal component functions, and their counterparts for the true principal component functions. It is shown that empirical principal component funcions have relatively little trouble capturing conventional extrema, but can experience difficulty distinguishing a ``shoulder'' in a curve from a small bump. For example, when the true principal component function has a shoulder, the probability that the empirical principal component function has instead a bump is approximately equal to 1/2. We suggest and describe the performance of bootstrap methods for assessing the strength of extrema. It is shown that the subsample bootstrap is more effective than the standard bootstrap in this regard. A ``bootstrap likelihood'' is proposed for measuring extremum strength. Exploratory numerical methods are suggested.Comment: Published at http://dx.doi.org/10.1214/009053606000000371 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page
    • …
    corecore