6 research outputs found

    Topics on high dimensional statistical inference and ANOVA for longitudinal data

    Get PDF
    The first part of this thesis proposes new tests for high dimensional data. Chapter 2 proposes a high dimensional simultaneous test for regression coefficients in linear model. This test aims to test the significance of a large number of covariates simultaneously under the so-called large p, small n situations where the conventional F-test is no longer applicable. We derive the asymptotic distribution of the proposed test statistic under the high dimensional null hypothesis and various scenarios of the alternatives, which allow power evaluations. We further extend the result to linear model with factorial designs. We also evaluate the power of the F-test under very mild dimensionality. Chapter 3 considers a test for high dimensional means under sparsity and dependency. We propose a threshold test statistic, which is designed to detect sparse and faint signal. The asymptotic distribution is obtained for non normal and dependent data under the large p, small n\u27\u27 setting, where the data dimension can grow exponentially fast as the sample size grows. A maximum test, which maximizes the standardized threshold test statistic over a range of thresholds, is also proposed. It is shown that the maximum test can attain the optimal detection boundary, in the sense that asymptotically, all the tests would be powerless below the boundary. The second part of this thesis is on analysis of variance (ANOVA) tests for treatment effects in longitudinal data with missing values. The treatment effects are modelled semiparametrically via a partially linear regression which is flexible in quantifying the time effects of treatments. The empirical likelihood is employed to formulate model-robust nonparametric ANOVA tests for treatment effects with respect to covariates, the nonparametric time-effect functions and interactions between covariates and time. The proposed tests can be readily modified for a variety of data and model combinations, that encompass parametric, semiparametric and nonparametric regression models; cross-sectional and longitudinal data, and with or without missing values

    Empirical likelihood for median regression model with designed censoring variables

    Get PDF
    AbstractWe propose a new and simple estimating equation for the parameters in median regression models with designed censoring variables, and then apply the empirical log likelihood ratio statistic to construct confidence region for the parameters. The empirical log likelihood ratio statistic is shown to have a standard chi-square distribution, which makes this method easy to implement. At the same time, another empirical log likelihood ratio statistic is proposed based on an existing estimating equation and the limiting distribution of the empirical likelihood ratio statistic is shown to be a sum of weighted chi-square distributions. We compare the performance of the empirical likelihood confidence region based on the new estimating equation, with that based on the existing estimating equation and a normal approximation method by simulation studies

    Topics on high dimensional statistical inference and ANOVA for longitudinal data

    Get PDF
    The first part of this thesis proposes new tests for high dimensional data. Chapter 2 proposes a high dimensional simultaneous test for regression coefficients in linear model. This test aims to test the significance of a large number of covariates simultaneously under the so-called "large p, small n" situations where the conventional F-test is no longer applicable. We derive the asymptotic distribution of the proposed test statistic under the high dimensional null hypothesis and various scenarios of the alternatives, which allow power evaluations. We further extend the result to linear model with factorial designs. We also evaluate the power of the F-test under very mild dimensionality. Chapter 3 considers a test for high dimensional means under sparsity and dependency. We propose a threshold test statistic, which is designed to detect sparse and faint signal. The asymptotic distribution is obtained for non normal and dependent data under the "large p, small n'' setting, where the data dimension can grow exponentially fast as the sample size grows. A maximum test, which maximizes the standardized threshold test statistic over a range of thresholds, is also proposed. It is shown that the maximum test can attain the optimal detection boundary, in the sense that asymptotically, all the tests would be powerless below the boundary. The second part of this thesis is on analysis of variance (ANOVA) tests for treatment effects in longitudinal data with missing values. The treatment effects are modelled semiparametrically via a partially linear regression which is flexible in quantifying the time effects of treatments. The empirical likelihood is employed to formulate model-robust nonparametric ANOVA tests for treatment effects with respect to covariates, the nonparametric time-effect functions and interactions between covariates and time. The proposed tests can be readily modified for a variety of data and model combinations, that encompass parametric, semiparametric and nonparametric regression models; cross-sectional and longitudinal data, and with or without missing values.</p

    Empirical likelihood for median regression model with designed censoring variables

    No full text
    We propose a new and simple estimating equation for the parameters in median regression models with designed censoring variables, and then apply the empirical log likelihood ratio statistic to construct confidence region for the parameters. The empirical log likelihood ratio statistic is shown to have a standard chi-square distribution, which makes this method easy to implement. At the same time, another empirical log likelihood ratio statistic is proposed based on an existing estimating equation and the limiting distribution of the empirical likelihood ratio statistic is shown to be a sum of weighted chi-square distributions. We compare the performance of the empirical likelihood confidence region based on the new estimating equation, with that based on the existing estimating equation and a normal approximation method by simulation studies.Empirical likelihood Designed censoring Fixed censoring Median regression model Confidence region
    corecore