32 research outputs found

    On the validity of minimin and minimax methods for support vector regression with interval data

    Get PDF
    Paper delivered at 9th International Symposium on Imprecise Probability: Theories and Applications, Pescara, Italy, 2015. Abstract: In the recent years, generalizations of support vector methods for analyzing interval-valued data have been suggested in both the regression and classification contexts. Standard Support Vector methods for precise data formalize these statistical problems as optimization problems that can be based on various loss functions. In the case of Support Vector Regression (SVR), on which we focus here, the function that best describes the relationship between a response and some explanatory variables is derived as the solution of the minimization problem associated with the expectation of some function of the residual, which is called the risk functional. The key idea of SVR is that even when considering an infinite-dimensional space of arbitrary regression functions, given a finitedimensional data set, the function minimizing the risk can be represented as the finite weighted sum of kernel functions. This allows to practically determine the SVR estimate by solving a much simpler optimization problem, even in the case of nonlinear regression. In case that only interval-valued observations of the variables of interest are available, it has been suggested to minimize the minimal or maximal risk values that are compatible with the imprecise data, yielding precise SVR estimates on the basis of interval data. In this paper, we show that also in the case of an interval-valued response the optimal function can be represented as the finite weighted sum of kernel functions. Thus, the minimin and minimax SVR estimates can be obtained by minimizing the corresponding simplified expressions of the empirical lower and upper risks, respectively

    Restricted Likelihood Ratio Testing in Linear Mixed Models with General Error Covariance Structure

    Get PDF
    We consider the problem of testing for zero variance components in linear mixed models with correlated or heteroscedastic errors. In the case of independent and identically distributed errors, a valid test exists, which is based on the exact finite sample distribution of the restricted likelihood ratio test statistic under the null hypothesis. We propose to make use of a transformation to derive the (approximate) test distribution for the restricted likelihood ratio test statistic in the case of a general error covariance structure. The proposed test proves its value in simulations and is finally applied to an interesting question in the field of well-being economics

    On the implementation of LIR: the case of simple linear regression with interval data

    Get PDF
    This paper considers the problem of simple linear regression with interval-censored data. That is, n pairs of intervals are observed instead of the n pairs of precise values for the two variables (dependent and independent). Each of these intervals is closed but possibly unbounded, and contains the corresponding (unobserved) value of the dependent or independent variable. The goal of the regression is to describe the relationship between (the precise values of) these two variables by means of a linear function. Likelihood-based Imprecise Regression (LIR) is a recently introduced, very general approach to regression for imprecisely observed quantities. The result of a LIR analysis is in general set-valued: it consists of all regression functions that cannot be excluded on the basis of likelihood inference. These regression functions are said to be undominated. Since the interval data can be unbounded, a robust regression method is necessary. Hence, we consider the robust LIR method based on the minimization of the residuals' quantiles. For this method, we prove that the set of all the intercept-slope pairs corresponding to the undominated regression functions is the union of finitely many polygons. We give an exact algorithm for determining this set (i.e., for determining the set-valued result of the robust LIR analysis), and show that it has worst-case time complexity O(n^3 log n). We have implemented this exact algorithm as part of the R package linLIR

    Likelihood-based Imprecise Regression

    Get PDF
    We introduce a new approach to regression with imprecisely observed data, combining likelihood inference with ideas from imprecise probability theory, and thereby taking different kinds of uncertainty into account. The approach is very general and applicable to various kinds of imprecise data, not only to intervals. In the present paper, we propose a regression method based on this approach, where no parametric distributional assumption is needed and interval estimates of quantiles of the error distribution are used to identify plausible descriptions of the relationship of interest. Therefore, the proposed regression method is very robust. We apply our robust regression method to an interesting question in the social sciences. The analysis, based on survey data, yields a relatively imprecise result, reflecting the high amount of uncertainty inherent in the analyzed data set

    Robust regression with imprecise data

    Get PDF
    We consider the problem of regression analysis with imprecise data. By imprecise data we mean imprecise observations of precise quantities in the form of sets of values. In this paper, we explore a recently introduced likelihood-based approach to regression with such data. The approach is very general, since it covers all kinds of imprecise data (i.e. not only intervals) and it is not restricted to linear regression. Its result consists of a set of functions, reflecting the entire uncertainty of the regression problem. Here we study in particular a robust special case of the likelihood-based imprecise regression, which can be interpreted as a generalization of the method of least median of squares. Moreover, we apply it to data from a social survey, and compare it with other approaches to regression with imprecise data. It turns out that the likelihood-based approach is the most generally applicable one and is the only approach accounting for multiple sources of uncertainty at the same time

    Regression analysis with imprecise data

    Get PDF
    Statistical methods usually require that the analyzed data are correct and precise observations of the variables of interest. In practice, however, often only incomplete or uncertain information about the quantities of interest is available. The question studied in the present thesis is, how a regression analysis can reasonably be performed when the variables are only imprecisely observed. At first, different approaches to analyzing imprecisely observed variables that were proposed in the Statistics literature are discussed. Then, a new likelihood-based methodology for regression analysis with imprecise data called Likelihood-based Imprecise Regression is introduced. The corresponding methodological framework is very broad and permits accounting for coarsening errors, in contrast to most alternative approaches to analyzing imprecise data. The methodology suggests considering as the result of a regression analysis the entire set of all regression functions that cannot be excluded in the light of the data, which can be interpreted as a confidence set. In the subsequent chapter, a very general regression method is derived from the likelihood-based methodology. This regression method does not impose restrictive assumptions about the form of the imprecise observations, about the underlying probability distribution, and about the shape of the relationship between the variables. Moreover, an exact algorithm is developed for the special case of simple linear regression with interval data and selected statistical properties of this regression method are studied. The proposed regression method turns out to be robust in terms of a high breakdown point and to provide very reliable insights in the sense of a set-valued result with a high coverage probability. In addition, an alternative approach proposed in the literature based on Support Vector Regression is studied in detail and generalized by embedding it into the framework of the formerly introduced likelihood-based methodology. In the end, the discussed regression methods are applied to two practical questions.Methoden der statistischen Datenanalyse setzen in der Regel voraus, dass die vorhandenen Daten präzise und korrekte Beobachtungen der untersuchten Größen sind. Häufig können aber bei praktischen Studien die interessierenden Werte nur unvollständig oder unscharf beobachtet werden. Die vorliegende Arbeit beschäftigt sich mit der Fragestellung, wie Regressionsanalysen bei unscharfen Daten sinnvoll durchgeführt werden können. Zunächst werden verschiedene Ansätze zum Umgang mit unscharf beobachteten Variablen diskutiert, bevor eine neue Likelihood-basierte Methodologie für Regression mit unscharfen Daten eingeführt wird. Als Ergebnis der Regressionsanalyse wird bei diesem Ansatz keine einzelne Regressionsfunktion angestrebt, sondern die gesamte Menge aller anhand der Daten plausiblen Regressionsfunktionen betrachtet, welche als Konfidenzbereich für den untersuchten Zusammenhang interpretiert werden kann. Im darauffolgenden Kapitel wird im Rahmen dieser Methodologie eine Regressionsmethode entwickelt, die sehr allgemein bezüglich der Form der unscharfen Beobachtungen, der möglichen Verteilungen der Zufallsgrößen sowie der Form des funktionalen Zusammenhangs zwischen den untersuchten Variablen ist. Zudem werden ein exakter Algorithmus für den Spezialfall der linearen Einfachregression mit Intervalldaten entwickelt und einige statistische Eigenschaften der Methode näher untersucht. Dabei stellt sich heraus, dass die entwickelte Regressionsmethode sowohl robust im Sinne eines hohen Bruchpunktes ist, als auch sehr verlässliche Erkenntnisse hervorbringt, was sich in einer hohen Überdeckungswahrscheinlichkeit der Ergebnismenge äußert. Darüber hinaus wird in einem weiteren Kapitel ein in der Literatur vorgeschlagener Alternativansatz ausführlich diskutiert, der auf Support Vector Regression aufbaut. Dieser wird durch Einbettung in den methodologischen Rahmen des vorher eingeführten Likelihood-basierten Ansatzes weiter verallgemeinert. Abschließend werden die behandelten Regressionsmethoden auf zwei praktische Probleme angewandt

    On the validity of minimin and minimax methods for support vector regression with interval data

    Get PDF
    Paper delivered at 9th International Symposium on Imprecise Probability: Theories and Applications, Pescara, Italy, 2015. Abstract: In the recent years, generalizations of support vector methods for analyzing interval-valued data have been suggested in both the regression and classification contexts. Standard Support Vector methods for precise data formalize these statistical problems as optimization problems that can be based on various loss functions. In the case of Support Vector Regression (SVR), on which we focus here, the function that best describes the relationship between a response and some explanatory variables is derived as the solution of the minimization problem associated with the expectation of some function of the residual, which is called the risk functional. The key idea of SVR is that even when considering an infinite-dimensional space of arbitrary regression functions, given a finitedimensional data set, the function minimizing the risk can be represented as the finite weighted sum of kernel functions. This allows to practically determine the SVR estimate by solving a much simpler optimization problem, even in the case of nonlinear regression. In case that only interval-valued observations of the variables of interest are available, it has been suggested to minimize the minimal or maximal risk values that are compatible with the imprecise data, yielding precise SVR estimates on the basis of interval data. In this paper, we show that also in the case of an interval-valued response the optimal function can be represented as the finite weighted sum of kernel functions. Thus, the minimin and minimax SVR estimates can be obtained by minimizing the corresponding simplified expressions of the empirical lower and upper risks, respectively

    Well-Being over the Life Span: Semiparametric Evidence from British and German Longitudinal Data

    Get PDF
    This paper applies semiparametric regression models using penalized splines to investigate the profile of well-being over the life span. Splines have the advantage that they do not require a priori assumptions about the form of the curve. Using data from the British Household Panel Survey (BHPS) and the German Socio-Economic Panel Study (SOEP), the analysis shows a common, quite similar, age-specific pattern of life satisfaction for both Britain and Germany that can be characterized by three age stages. In the first stage, life satisfaction declines until approximately the fifth life decade. In the second age stage, well-being clearly increases and has a second turning point (maximum) after which well-being decreases in the third age stage. Several reasons for the three-phase pattern are discussed. We point to the fact that neither polynomial functions of the third nor the fourth degree describe the relationship adequately: polynomials locate the minimum and the maximum imprecisely. In addition, our analysis discusses the indistinguishability of age, period, and cohort effects: we propose estimating age-period models that control for cohort effects including substantive variables, such as the life expectancy of the birth cohort, and further observed socioeconomic characteristics in the regression.subjective well-being, life satisfaction, semiparametric regression, penalized splines, age-period model, age-cohort model

    Well-Being over the Life Span: Semiparametric Evidence from British and German Longitudinal Data

    Get PDF
    This paper applies semiparametric regression models using penalized splines to investigate the profile of well-being over the life span. Splines have the advantage that they do not require a priori assumptions about the form of the curve. Using data from the British Household Panel Survey (BHPS) and the German Socio-Economic Panel Study (SOEP), the analysis shows a common, quite similar, age-specific pattern of life satisfaction for both Britain and Germany that can be characterized by three age stages. In the first stage, life satisfaction declines until approximately the fifth life decade. In the second age stage, well-being clearly increases and has a second turning point (maximum) after which well-being decreases in the third age stage. Several reasons for the three-phase pattern are discussed. We point to the fact that neither polynomial functions of the third nor the fourth degree describe the relationship adequately: polynomials locate the minimum and the maximum imprecisely. In addition, our analysis discusses the indistinguishability of age, period, and cohort effects: we propose estimating age-period models that control for cohort effects including substantive variables, such as the life expectancy of the birth cohort, and further observed socioeconomic characteristics in the regression.Subjective well-being, life satisfaction, semiparametric regression, penalized splines, age-period model, age-cohort model

    Well-Being over the Life Span: Semiparametric Evidence from British and German Longitudinal Data

    Get PDF
    This paper applies semiparametric regression models using penalized splines to investigate the profile of well-being over the life span. Splines have the advantage that they do not require a priori assumptions about the form of the curve. Using data from the British Household Panel Survey (BHPS) and the German Socio-Economic Panel Study (SOEP), the analysis shows a common, quite similar, age-specific pattern of life satisfaction for both Britain and Germany that can be characterized by three age stages. In the first stage, life satisfaction declines until approximately the fifth life decade. In the second age stage, well-being clearly increases and has a second turning point (maximum) after which well-being decreases in the third age stage. Several reasons for the three-phase pattern are discussed. We point to the fact that neither polynomial functions of the third nor the fourth degree describe the relationship adequately: polynomials locate the minimum and the maximum imprecisely. In addition, our analysis discusses the indistinguishability of age, period, and cohort effects: we propose estimating age-period models that control for cohort effects including substantive variables, such as the life expectancy of the birth cohort, and further observed socioeconomic characteristics in the regression.subjective well-being, life satisfaction, semiparametric regression, penalized splines, age-period model, age-cohort model
    corecore