8 research outputs found

    A General Family of Penalties for Combining Differing Types of Penalties in Generalized Structured Models

    Get PDF
    Penalized estimation has become an established tool for regularization and model selection in regression models. A variety of penalties with specific features are available and effective algorithms for specific penalties have been proposed. But not much is available to fit models that call for a combination of different penalties. When modeling rent data, which will be considered as an example, various types of predictors call for a combination of a Ridge, a grouped Lasso and a Lasso-type penalty within one model. Algorithms that can deal with such problems, are in demand. We propose to approximate penalties that are (semi-)norms of scalar linear transformations of the coefficient vector in generalized structured models. The penalty is very general such that the Lasso, the fused Lasso, the Ridge, the smoothly clipped absolute deviation penalty (SCAD), the elastic net and many more penalties are embedded. The approximation allows to combine all these penalties within one model. The computation is based on conventional penalized iteratively re-weighted least squares (PIRLS) algorithms and hence, easy to implement. Moreover, new penalties can be incorporated quickly. The approach is also extended to penalties with vector based arguments; that is, to penalties with norms of linear transformations of the coefficient vector. Some illustrative examples and the model for the Munich rent data show promising results

    Penalized regression for discrete structures

    Get PDF
    Penalisierte Regressionsmodelle stellen eine Möglichkeit dar die Selektion von Kovariablen in die Schätzung eines Modells zu integrieren. Penalisierte Ansätze eignen sich insbesondere dafür, komplexen Strukturen in den Kovariablen eines Modells zu berücksichtigen. Diese Arbeit beschäftigt sich mit verschiedenen Penalisierungsansätzen für diskrete Strukturen, wobei der Begriff "diskrete Struktur" in dieser Arbeit alle Arten von kategorialen Einflussgrößen, von effekt-modifizierenden, kategorialen Einflussgrößen sowie von gruppenspezifischen Effekten in hierarchisch strukturierten Daten bezeichnet. Ihnen ist gemein, dass sie zu einer verhältnismäßig großen Anzahl an zu schätzenden Koeffizienten führen können. Deswegen besteht ein besonderes Interesse daran zu erfahren, welche Kategorien einer Einflussgröße die Zielgröße beeinflussen, und welche Kategorien unterschiedliche beziehungsweise ähnliche Effekte auf die Zielgröße haben. Kategorien mit ähnlichen Effekten können beispielsweise durch fused Lasso Penalties identifiziert werden. Jedoch beschränken sich einige, bestehende Ansätze auf das lineare Modell. Die vorliegende Arbeit überträgt diese Ansätze auf die Klasse der generalisierten linearen Regressionsmodelle. Das beinhaltet computationale wie theoretische Aspekte. Konkret wird eine fused Lasso Penalty für effekt-modifizierende kategoriale Einflussgrößen in generalisierten linearen Regressionsmodellen vorgeschlagen. Sie ermöglicht es, Einflussgrößen zu selektieren und Kategorien einer Einflussgröße zu fusionieren. Gruppenspezifische Effekte, die die Heterogenität in hierarchisch strukturierten Daten berücksichtigen, sind ein Spezialfall einer solchen effekt-modifizierenden, kategorialen Größe. Hier bietet der penalisierte Ansatz zwei wesentliche Vorteile: (i) Im Gegensatz zu gemischten Modellen, die stärkere Annahmen treffen, kann der Grad der Heterogenität sehr leicht reduziert werden. (ii) Die Schätzung ist effizienter als im unpenalisierten Ansatz. In orthonormalen Settings können Fused Lasso Penalties konzeptionelle Nachteile haben. Als Alternative wird eine L0 Penalty für diskrete Strukturen in generalisierten linearen Regressionsmodellen diskutiert, wobei die sogenannte L0 "Norm" eine Indikatorfunktion für Argumente ungleich Null bezeichnet. Als Penalty ist diese Funktion so interessant wie anspruchsvoll. Betrachtet man eine Approximation der L0 Norm als Verlustfunktion wird im Grenzwert der bedingte Modus einer Zielgröße geschätzt.Penalties are an established approach to stabilize estimation and to select predictors in regression models. Penalties are especially useful when discrete structures matter. In this thesis, the term "discrete structure" subsumes all kinds of categorical effects, categorical effect modifiers and group-specific effects for hierarchical settings. Discrete structures can be challenging as they need to be coded, and as they can result in a huge number of coefficients. Moreover, users are interested in which levels of a discrete covariate are to be distinguished with respect to the response of a model, or in whether some levels have the same impact on the response. One wants to detect non-influential coefficients and to allow for coefficients with the same estimates. That requires carefully tailored penalization as, for example, provided by different variations of the fused Lasso. However, the reach of many existing methods is restricted as mostly, the response is assumed to be Gaussian. In this thesis, some efforts to extend these approaches are made. The focus is on appropriate penalization strategies for discrete structures in generalized linear models (GLMs). Lasso-type penalties in GLMs require special estimation procedures. In a first step, an existing Fisher scoring algorithm, that allows to combine different types of penalties in one model, is generalized. This algorithm provides the computational basis for the subsequent topics. In a second step, varying coefficients with categorical effect modifiers are considered. Existing methodology for linear models is extended to GLMs. In hierarchical settings, fixed effects models, which are also called group-specific models and which are a special case of categorical effect modifiers, are a common choice to account for the heterogeneity in the data. Applying the proposed penalization techniques for categorical effect modifiers to hierarchical settings offers some benefits: In comparison to mixed models, the approach is able to fuse second level units easily. In comparison to unpenalized group-specific models, efficiency is gained. In a third step, fused Lasso-type penalties for discrete structures are considered in more detail. Especially in orthonormal settings, Lasso-type penalties for categorical effects have some drawbacks regarding the clustering of the coefficients. To overcome these problems, an L0 penalty for discrete structures is proposed. Again, computational issues are met by a quadratic approximation. This approximation is not only useful in the context of penalized regression for discrete structures, but also when an approximation of the L0 norm is employed as a loss function. That is, it is useful for regression models that approximate the conditional mode of a response. For linear predictors, a close link to kernel methods allows to show that the proposed estimator is consistent and asymptotically normal. Regression models with semiparametric predictors are possible

    Regularization and Model Selection with Categorial Predictors and Effect Modifiers in Generalized Linear Models

    Get PDF
    We consider varying-coefficient models with categorial effect modifiers in the framework of generalized linear models. We distinguish between nominal and ordinal effect modifiers, and propose adequate Lasso-type regularization techniques that allow for (1) selection of relevant covariates, and (2) identification of coefficient functions that are actually varying with the level of a potentially effect modifying factor. We investigate the estimators’ large sample properties, and show in simulation studies that the proposed approaches perform very well for finite samples, too. Furthermore, the presented methods are compared with alternative procedures, and applied to real-world medical data

    Regularization and Model Selection with Categorial Predictors and Effect Modifiers in Generalized Linear Models

    Get PDF
    Varying-coefficient models with categorical effect modifiers are considered within the framework of generalized linear models. We distinguish between nominal and ordinal effect modifiers, and propose adequate Lasso-type regularization techniques that allow for (1) selection of relevant covariates, and (2) identification of coefficient functions that are actually varying with the level of a potentially effect modifying factor. We investigate large sample properties, and show in simulation studies that the proposed approaches perform very well for finite samples, too. In addition, the presented methods are compared with alternative procedures, and applied to real-world medical data

    Penalized regression for discrete structures

    Get PDF
    Penalisierte Regressionsmodelle stellen eine Möglichkeit dar die Selektion von Kovariablen in die Schätzung eines Modells zu integrieren. Penalisierte Ansätze eignen sich insbesondere dafür, komplexen Strukturen in den Kovariablen eines Modells zu berücksichtigen. Diese Arbeit beschäftigt sich mit verschiedenen Penalisierungsansätzen für diskrete Strukturen, wobei der Begriff "diskrete Struktur" in dieser Arbeit alle Arten von kategorialen Einflussgrößen, von effekt-modifizierenden, kategorialen Einflussgrößen sowie von gruppenspezifischen Effekten in hierarchisch strukturierten Daten bezeichnet. Ihnen ist gemein, dass sie zu einer verhältnismäßig großen Anzahl an zu schätzenden Koeffizienten führen können. Deswegen besteht ein besonderes Interesse daran zu erfahren, welche Kategorien einer Einflussgröße die Zielgröße beeinflussen, und welche Kategorien unterschiedliche beziehungsweise ähnliche Effekte auf die Zielgröße haben. Kategorien mit ähnlichen Effekten können beispielsweise durch fused Lasso Penalties identifiziert werden. Jedoch beschränken sich einige, bestehende Ansätze auf das lineare Modell. Die vorliegende Arbeit überträgt diese Ansätze auf die Klasse der generalisierten linearen Regressionsmodelle. Das beinhaltet computationale wie theoretische Aspekte. Konkret wird eine fused Lasso Penalty für effekt-modifizierende kategoriale Einflussgrößen in generalisierten linearen Regressionsmodellen vorgeschlagen. Sie ermöglicht es, Einflussgrößen zu selektieren und Kategorien einer Einflussgröße zu fusionieren. Gruppenspezifische Effekte, die die Heterogenität in hierarchisch strukturierten Daten berücksichtigen, sind ein Spezialfall einer solchen effekt-modifizierenden, kategorialen Größe. Hier bietet der penalisierte Ansatz zwei wesentliche Vorteile: (i) Im Gegensatz zu gemischten Modellen, die stärkere Annahmen treffen, kann der Grad der Heterogenität sehr leicht reduziert werden. (ii) Die Schätzung ist effizienter als im unpenalisierten Ansatz. In orthonormalen Settings können Fused Lasso Penalties konzeptionelle Nachteile haben. Als Alternative wird eine L0 Penalty für diskrete Strukturen in generalisierten linearen Regressionsmodellen diskutiert, wobei die sogenannte L0 "Norm" eine Indikatorfunktion für Argumente ungleich Null bezeichnet. Als Penalty ist diese Funktion so interessant wie anspruchsvoll. Betrachtet man eine Approximation der L0 Norm als Verlustfunktion wird im Grenzwert der bedingte Modus einer Zielgröße geschätzt.Penalties are an established approach to stabilize estimation and to select predictors in regression models. Penalties are especially useful when discrete structures matter. In this thesis, the term "discrete structure" subsumes all kinds of categorical effects, categorical effect modifiers and group-specific effects for hierarchical settings. Discrete structures can be challenging as they need to be coded, and as they can result in a huge number of coefficients. Moreover, users are interested in which levels of a discrete covariate are to be distinguished with respect to the response of a model, or in whether some levels have the same impact on the response. One wants to detect non-influential coefficients and to allow for coefficients with the same estimates. That requires carefully tailored penalization as, for example, provided by different variations of the fused Lasso. However, the reach of many existing methods is restricted as mostly, the response is assumed to be Gaussian. In this thesis, some efforts to extend these approaches are made. The focus is on appropriate penalization strategies for discrete structures in generalized linear models (GLMs). Lasso-type penalties in GLMs require special estimation procedures. In a first step, an existing Fisher scoring algorithm, that allows to combine different types of penalties in one model, is generalized. This algorithm provides the computational basis for the subsequent topics. In a second step, varying coefficients with categorical effect modifiers are considered. Existing methodology for linear models is extended to GLMs. In hierarchical settings, fixed effects models, which are also called group-specific models and which are a special case of categorical effect modifiers, are a common choice to account for the heterogeneity in the data. Applying the proposed penalization techniques for categorical effect modifiers to hierarchical settings offers some benefits: In comparison to mixed models, the approach is able to fuse second level units easily. In comparison to unpenalized group-specific models, efficiency is gained. In a third step, fused Lasso-type penalties for discrete structures are considered in more detail. Especially in orthonormal settings, Lasso-type penalties for categorical effects have some drawbacks regarding the clustering of the coefficients. To overcome these problems, an L0 penalty for discrete structures is proposed. Again, computational issues are met by a quadratic approximation. This approximation is not only useful in the context of penalized regression for discrete structures, but also when an approximation of the L0 norm is employed as a loss function. That is, it is useful for regression models that approximate the conditional mode of a response. For linear predictors, a close link to kernel methods allows to show that the proposed estimator is consistent and asymptotically normal. Regression models with semiparametric predictors are possible

    Modeling Clustered Heterogeneity: Fixed Effects, Random Effects and Mixtures

    Get PDF
    Although each statistical unit on which measurements are taken is unique, typically there is not enough information available to account totally for its uniqueness. Therefore heterogeneity among units has to be limited by structural assumptions. One classical approach is to use random effects models which assume that heterogeneity can be described by distributional assumptions. However, inference may depend on the assumed mixing distribution and it is assumed that the random effects and the observed covariates are independent. An alternative considered here, are fixed effect models, which let each unit have its own parameter. They are quite flexible but suffer from the large number of parameters. The structural assumption made here is that there are clusters of units that share the same effects. It is shown how clusters can be identified by tailored regularized estimators. Moreover, it is shown that the regularized estimates compete well with estimates for the random effects model, even if the latter is the data generating model. They dominate if clusters are present
    corecore