358 research outputs found

    Least squares problems involving generalized Kronecker products and application to bivariate polynomial regression

    Get PDF
    A method for solving least squares problems (A ⊗ Bi)x = b whose coefficient matrices have generalized Kronecker product structure is presented. It is based on the exploitation of the block structure of the Moore-Penrose inverse and the reflexive minimum norm g-inverse of the coefficient matrix, and on the QR method for solving least squares problems. Firstly, the general case where A is a rectangular matrix is considered, and then the special case where A is square is analyzed. This special case is applied to the problem of bivariate polynomial regression, in which the involved matrices are structured matrices (Vandermonde or Bernstein-Vandermonde matrices). In this context, the advantage of using the Bernstein basis instead of the monomial basis is shown. Numerical experiments illustrating the good behavior of the proposed algorithm are included.Ministerio de Economía y Competitivida

    HE Plots for Repeated Measures Designs

    Get PDF
    Hypothesis error (HE) plots, introduced in Friendly (2007), provide graphical methods to visualize hypothesis tests in multivariate linear models, by displaying hypothesis and error covariation as ellipsoids and providing visual representations of effect size and significance. These methods are implemented in the heplots for R (Fox, Friendly, and Monette 2009a) and SAS (Friendly 2006), and apply generally to designs with fixed-effect factors (MANOVA), quantitative regressors (multivariate multiple regression) and combined cases (MANCOVA). This paper describes the extension of these methods to repeated measures designs in which the multivariate responses represent the outcomes on one or more âÂÂwithin-subjectâ factors. This extension is illustrated using the heplots for R. Examples describe one- sample profile analysis, designs with multiple between-S and within-S factors, and doubly- multivariate designs, with multivariate responses observed on multiple occasions.

    Smoothing mixed models for spatial and spatio-temporal data

    Get PDF
    El desarrollo de gran parte de los modelos y métodos estadísticos ha ido ligado al deseo de estudiar aplicaciones específicas dentro de diversos ámbitos científicos. El análisis de datos de naturaleza espacial y espacio-temporal es en la actualidad de gran interés para la modelización estadística. Problemas relacionados con la metereología, la contaminación medioambiental, la ecología, la epidemiología o la economía, demandan el uso de modelos estadísticos para el análisis de datos espaciales y espacio-temporales. En el primer capítulo de esta tesis, introducimos los conceptos básicos de la estadística espacial, así como la clasificacióon de los datos espaciales según su tipología y una revisión de los modelos tradicionales en la literatura con sus principales limitaciones. En esta tesis proponemos la modelización de este tipo de datos mediante modelos de regresión no-paramétricos, también denominadas técnicas de suavizado. Nuestra propuesta es considerar la modelización desde una perspectiva común para los diferentes tipos de datos espaciales, mediante el uso de los denominados splines con penalizaciones (P-splines). Estos modelos han adquirido una gran popularidad en los últimos años ya que: (i) se tratan de suavizadores de rango bajo, ya que se construyen a partir de bases para la regresión de (B-splines) menor tamaño que el número de datos, que son computacionalmente más eficientes que otros métodos de suavizado basados en splines; (ii) la formulación como modelos mixtos permite incorporar estructuras más complejas en términos de efectos aleatorios que pueden estimarse simultáneamente al suavizado. El segundo capítulo está enteramente dedicado a introducir los aspectos fundamentales de la metodología de los P-splines, para datos Gaussianos y en el contexto de los modelos lineales generalizados para el caso de datos no-Gaussianos. Para el caso multidimensional, la base para la regresión se define como el producto Tensorial de las bases de B-spline marginales, que en el caso de datos en grids o mallas multidimensionales es el producto de Kronecker de las matrices de B-spline. En esta situación, el uso de los métodos de array permite el ajuste de los modelos de manera computacionalmente eficiente. Presentamos también en detalle la representación como modelo mixto, y los métodos de estimación. Aunque esta representación no es nueva en la literatura de los splines, nuestra reparametrización de las bases y de la penalización del modelo permite la decomposicionn del ajuste en términos de la suma de funciones marginales e interacciones. Por último en este capítulo, adaptamos los algoritmos basados en arrays para la formulación como modelo mixto. En el tercer capítulo, extendemos los modelos de P-splines para el suavizado de datos espaciales. La estructura de los datos espaciales requiere del uso de un nuevo producto Tensorial, el producto de Kronecker por filas. En este caso, los métodos de arrays no son aplicables, y la reparametrización como modelo mixto no es inmediata, sin embargo demostramos cómo se puede llegar a ella mediante resultados matriciales. Ilustraremos la metodología para la tipología de datos y ejemplos de datos espaciales introducidos en el primer capítulo. Como aplicación de la metodología propuesta en este capítulo, abordamos el análisis de datos regionales de conteo. Los datos de conteo se asumen distribuidos según una variable aleatoria Poisson, sin embargo, este supuesto resulta en ocasiones erróneo cuando los datos presentan una variabilidad heterogénea no explicada (sobre-dispersión). Como resultado de este capítulo, en Lee y Durbán (2009), analizamos los datos de cáncer de labio en Escocia. Estos datos han sido muy utilizados en la literatura de los modelos para datos regionales, sobre todo desde el enfoque de los modelos condicionalmente autorregresivos (CAR). En este trabajo proponemos modelos de suavizado híbridos que permiten incorporar diferentes fuentes de variabilidad espacial: (i) una variabilidad espacial a gran escala, capturada por el spline, y (ii) una varibilidad local a pequeña escala dada por la estructura de vecindad de las regiones del estudio, con una estructura tipo CAR. La ventaja de nuestro modelo híbrido, es que ambas fuentes se pueden estimar simultánemente. Los estudios de simulación realizados corroboran que el modelo híbrido permite capturar las diferentes fuentes de variabilidad en los escenarios propuestos. En el cuarto capítulo, consideramos el caso multidimensional mediante la descomposición de los modelos como la suma de funciones de suavizado, en términos de efectos principales o aditivos e interacciones (estos modelos son denominados modelos de suavizado ANOVA, por su analogía a los diseños factoriales y análisis de la varianza). La construcción de estos modelos mediante bases de B-spline, sufre de problemas de identificabilidad dado que no se pueden estimar de manera única. Nuestra solución para evitar estos problemas es la reparametrización como modelos mixto desarrollada en los capítulos anteriores. Esta reparametrización permite identificar cuáles son los elementos que aparecen repetidos, la solución al problema se reduce por tanto a eliminar los componentes repetidos, lo cual permite de manera sencilla construir la nueva base y la penalización para el modelo identificable. Lo interesante de este sencillo procedimiento es su equivalencia a imponer restricciones lineales sobre los coeficientes del modelo original. El estudio de simulación presentado en este capítulo, demuestra que el modelo de suavizado ANOVA actúa bajo los escenarios considerados del mismo modo que el modelo más apropiado para cada caso. En algunas situaciones, resulta de interés considerar tan sólo algunos efectos principales e interacciones e ignorar otros. Estos modelos, reciben el nombre de modelos reducidos de suavizado ANOVA. Un ejemplo es el caso espacio-temporal, donde resulta de interés la decomposición del proceso en términos de la suma de una superficie espacial, una función suave para el componente temporal, y un componente espacio-temporal que recoge la interaccióon espacio-tiempo. Para el caso espacio-temporal, construiremos las bases para la regresión mediante el producto de Kronecker de las bases de B-spline espacial y temporal, lo cual permite para este caso utilizar los métodos de array definidos en los capítulos anteriores. Siguiendo el procedimiento desarrollado en este capítulo, demostramos cómo construir los modelos e identificamos las restricciones sobre los coeficientes en el modelo original. Para ilustrar esta metodología en Lee y Durbán (2010), consideramos el uso de estos modelos para el análisis espacio-temporal de los niveles de ozono en Europa entre los años 1999 y 2005. Por último, en este capítulo, proponemos un método computacionalmente eficiente para el caso de modelos con interacción. En algunas situaciones, el tamaño de la matriz de B-splines para la interaccióon es muy grande, lo que conlleva a que la estimación de los parámetros sea computacionalmente intensiva. En el caso de los modelos de suavizado ANOVA, es posible asumir que la mayor parte de la estructura es recogida por los efectos principales, y por tanto es preferible reducir la complejidad del modelo reduciendo el tamaño de las bases de la interacción. Sin embargo, esta reducción no es arbitraria, puesto que de otro modo los modelos no estarían anidados. Nuestra propuesta es la construcción de bases anidadas de B-spline para las interacciones. En el caso espacio-temporal, esta solución permite modelizar la parte temporal con una base de más tamaño para recoger la estructura temporal de los datos, y una base anidada (mucho más pequeña), para modelizar la interacción espacio-tiempo. Finalmente, en el quinto capítulo resumimos las principales aportaciones realizadas en esta tesis, y proponemos posibles futuras extensiones a los modelos desarrollados y nuevas líneas de investigación.-------------------------------------------------------------------------------------------------------------------------------------------------------The development of many of statistical methods and models has been linked to the study of specific applications within various scientific research fields. The analysis of spatial and spatio-temporal data is currently of great interest to statistical modelling. Problems related to meteorology, environmental pollution, ecology, epidemiology or economics, demand the use of statistical models for spatial and spatio-temporal data. In the first chapter of this thesis, we introduce the basic concepts in spatial statistics, the classification of spatial data according to their typology and a review of the classical models in the literature and their limitations. In this thesis, we propose the modelling of these data using non-parametric regression methods, also known as smoothing techniques. Our proposal is to consider the modelling from an unified perspective for the different types of spatial data, by means of the use of so-called penalized splines (P-splines). These models have become very popular in recent years as: (i) they are low-rank smoothers, because they are constructed from regression basis (B-splines) of smaller dimension than the number of observations, so they are computationally more efficient than other splines-based methods; (ii) the formulation as a mixed model allows the incorporation of more complex structures in terms of random effects that are estimated simultaneously to the smoothing. The second chapter is entirely dedicated to introducing the fundamentals of the P-spline methodology for Gaussian data and in the context of generalized linear models for non-Gaussian data. In multidimensional problems, the regression basis is defined as the Tensor product of the marginal B-spline bases, that in the case of data in regular multidimensional grids it is the Kronecker product of matrices of B-splines. For these situations, the array methods allow to fit the models in a computationally efficient way. We also detail the representation as a mixed model and estimation methods. Although this representation is not new in the literature of splines, our reparameterization of the basis and penalty of the model allows the decomposition of the fit in terms of the sum of marginal functions and interactions. Finally, in this chapter, we adapt the array algorithms to mixed models. In the third chapter, we extend the P-spline models for smoothing spatial data. The structure of spatial data requires the use of a new Tensor product, the row-wise Kronecker product. Then for this particular case, the array methods are not applicable, and the mixed model reparameterization is not inmediate, however, we demonstrate how it can be obtained using some matrix algebra results. We illustrate the methodology for the types of spatial data and examples shown in chapter one. As an application of the methodology proposed in this chapter, we discuss the analysis of regional count data. Count data are usually assumed distributed according to a Poisson random variable, however, this assumption is sometimes incorrect when the data have an unexplained heterogeneous variability (overdispersion). As a resuls of this chapter, in Lee and Durbán (2009), we analyzed the well-known scottish lip cancer data. These data have been widely used in the literature of models for regional data, especially from the conditionally autoregressive (CAR) models approach. We propose a hybrid smooth model that allows to incorporate different sources of spatial variability: (i) a large-scale spatial variability, captured by the spline, and (ii) a local small-scale variability defined by the neighborhood structure of the regions of the study, with a CAR structure. The advantage of our hybrid model is that both sources can estimate simultaneously. The simulation studies carried out confirm that the hybrid model can capture the different sources of variability in the proposed scenarios. In the fourth chapter, we consider the multidimensional case by decomposing the model as the sum of smooth functions in terms of main or additive effects and interactions (these models are called Smooth-ANOVA models, by analogy to the factorial design and analysis-of-variance). The construction of these models with B-spline bases,suffers from problems of identifiability since they cannot be estimated uniquely. Our solution to avoid these problems is the reparameterization as mixed models developed in previous chapters. This reparameterization allows us to identify what are the elements that are repeated, the solution to the problem is then reduced to eliminate the repeated terms. This procedure allows a simple way to build a new basis and penalty for the identifiable model. The interesting result of this simple procedure is exactly equivalent to apply linear constraints to the regression coefficients of the original model. The simulation study presented in this chapter, shows that the Smooth-ANOVA model performs in the same way as the most appropiate model for each of the scenarios considered. In some situations, it is of interest to consider in the modelling only some main effects and interactions and ignore the rest. These models, models are called reduced Smooth-ANOVA models. An example of this is the spatio-temporal case, where this decomposition allows to represent the smoothing in terms of the sum of a spatial surface, a smooth function for the temporal component and a smooth term for the space-time interaction. For the spatio-temporal case, we construct the regression basis as the Kronecker product of the B-spline bases of space and time dimensions. This allows us to use the array methods defined in previous chapters. Following the procedure described in this chapter, we show how to construct the models and identify the linear restrictions on the regression coefficients of the original model. In Lee and Durbán (2010), we apply the reduced S-ANOVA model to the spatio-temporal analysis of ozone levels in Europe during period 1999-2005. Finally, in this chapter we propose a computationally efficient methods for models with interactions. In some cases, the size of the B-spline basis for the interacton is very large, which implies that the parameter estimation is computationally intensive. For the case of Smooth-ANOVA models, it is possible to assume that most of the structure is captured by the main effects, and is therefore preferable to reduce the complexity of the model by reducing the size of the bases of the interaction. However, this reduction is not arbitrary, since otherwise the models would not be nested. Our proposal is the construction of nested B-spline basis for the interactions. For the spatio-temporal case, this solution allows to model the temporal part with a larger basis to capture the time structure of the data, and a nested basis (much smaller) for the space-time interaction. Finally, the fifth chapter summarizes the main contributions made in this thesis, we suggest possible future extensions to the models developed and new lines of research

    (Global) Optimization: Historical notes and recent developments

    Get PDF

    Regression modelling with I-priors

    Full text link
    We introduce the I-prior methodology as a unifying framework for estimating a variety of regression models, including varying coefficient, multilevel, longitudinal models, and models with functional covariates and responses. It can also be used for multi-class classification, with low or high dimensional covariates. The I-prior is generally defined as a maximum entropy prior. For a regression function, the I-prior is Gaussian with covariance kernel proportional to the Fisher information on the regression function, which is estimated by its posterior distribution under the I-prior. The I-prior has the intuitively appealing property that the more information is available on a linear functional of the regression function, the larger the prior variance, and the smaller the influence of the prior mean on the posterior distribution. Advantages compared to competing methods, such as Gaussian process regression or Tikhonov regularization, are ease of estimation and model comparison. In particular, we develop an EM algorithm with a simple E and M step for estimating hyperparameters, facilitating estimation for complex models. We also propose a novel parsimonious model formulation, requiring a single scale parameter for each (possibly multidimensional) covariate and no further parameters for interaction effects. This simplifies estimation because fewer hyperparameters need to be estimated, and also simplifies model comparison of models with the same covariates but different interaction effects; in this case, the model with the highest estimated likelihood can be selected. Using a number of widely analyzed real data sets we show that predictive performance of our methodology is competitive. An R-package implementing the methodology is available (Jamil, 2019)

    (Global) Optimization: Historical notes and recent developments

    Get PDF
    Recent developments in (Global) Optimization are surveyed in this paper. We collected and commented quite a large number of recent references which, in our opinion, well represent the vivacity, deepness, and width of scope of current computational approaches and theoretical results about nonconvex optimization problems. Before the presentation of the recent developments, which are subdivided into two parts related to heuristic and exact approaches, respectively, we briefly sketch the origin of the discipline and observe what, from the initial attempts, survived, what was not considered at all as well as a few approaches which have been recently rediscovered, mostly in connection with machine learning

    Separability as a modeling paradigm in large probabilistic models

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 185-191).Many interesting stochastic models can be formulated as finite-state vector Markov processes, with a state characterized by the values of a collection of random variables. In general, such models suffer from the curse of dimensionality: the size of the state space grows exponentially with the number of underlying random variables, thereby precluding conventional modeling and analysis. A potential cure to this curse is to work with models that allow the propagation of partial information, e.g. marginal distributions, expectations, higher-moments, or cross-correlations, as derived from the joint distribution for the network state. This thesis develops and rigorously investigates the notion of separability, associated with structure in probabilistic models that permits exact propagation of partial information. We show that when partial information can be propagated exactly, it can be done so linearly. The matrices for propagating such partial information share many valuable spectral relationships with the underlying transition matrix of the Markov chain. Separability can be understood from the perspective of subspace invariance in linear systems, though it relates to invariance in a non-standard way. We analyze the asymptotic generality-- as the number of random variables becomes large-of some special cases of separability that permit the propagation of marginal distributions. Within this discussion of separability, we introduce the generalized influence model, which incorporates as special cases two prominent models permitting the propagation of marginal distributions: the influence model and Markov chains on permutations (the symmetric group). The thesis proposes a potentially tractable solution to learning informative model parameters, and illustrates many advantageous properties of the estimator under the assumption of separability. Lastly, we illustrate separability in the general setting without any notion of time-homogeneity, and discuss potential benefits for inference in special cases.by William J. Richoux.Ph.D

    Multivariate financial econometrics: with applications to volatility modelling, option pricing and asset allocation

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore