8 research outputs found

    Proposing Robust LAD-Atan Penalty of Regression Model Estimation for High Dimensional Data

    Get PDF
           لاقت قضية نموذج الانحدار اهتمامًا بالغ الأهمية لاختيار المتغيرات، إذ انه يؤدي دورًا أساسيًا في التعامل مع البيانات ذات  الابعاد العالية. يتم استخدام معكوس الظل الذي يشير إليه  دالة جزاء Atan في كل من التقدير والاختيار المتغير كطريقة فعالة. ومع ذلك ، فإن دالة الجزاء  Atan حساسة جدًا للقيم الشاذة لمتغيرات الاستجابة أو توزيع ملتوي للأخطاء أو توزيع ذو ذيل ثقيل. بينما  : LAD هي وسيلة جيدة للحصول على حصانة تقدير الانحدار. ان الهدف الاساس من هذا البحث هو اقتراح مُقدّر Atan يجمع بين هاتين الفكرتين في آن واحد. لقد اظهرت تجارب المحاكاة وتطبيق البيانات الحقيقية أن مقدّر LAD-Atan المقترح هو الافضل مقارنة بالمقدرات الاخرى.           The issue of penalized regression model has received considerable critical attention to variable selection. It plays an essential role in dealing with high dimensional data. Arctangent denoted by the Atan penalty has been used in both estimation and variable selection as an efficient method recently. However, the Atan penalty is very sensitive to outliers in response to variables or heavy-tailed error distribution. While the least absolute deviation is a good method to get robustness in regression estimation. The specific objective of this research is to propose a robust Atan estimator from combining these two ideas at once. Simulation experiments and real data applications show that the proposed LAD-Atan estimator has superior performance compared with other estimators. &nbsp

    Variable selection and estimation in high-dimensional models

    Full text link

    L1-Regularized Least Squares for Support Recovery of High Dimensional Single Index Models with Gaussian Designs

    Get PDF
    It is known that for a certain class of single index models (SIMs) zˇSc0, support recovery is impossible when X ~ (0, p×p) and a model complexity adjusted sample size is below a critical threshold. Recently, optimal algorithms based on Sliced Inverse Regression (SIR) were suggested. These algorithms work provably under the assumption that the design X comes from an i.i.d. Gaussian distribution. In the present paper we analyze algorithms based on covariance screening and least squares with L1 penalization (i.e. LASSO) and demonstrate that they can also enjoy optimal (up to a scalar) rescaled sample size in terms of support recovery, albeit under slightly different assumptions on f and ε compared to the SIR based algorithms. Furthermore, we show more generally, that LASSO succeeds in recovering the signed support of β0 if X ~ (0, Σ), and the covariance Σ satisfies the irrepresentable condition. Our work extends existing results on the support recovery of LASSO for the linear model, to a more general class of SIMs.Statistic

    Sparse group sufficient dimension reduction and covariance cumulative slicing estimation

    Get PDF
    This dissertation contains two main parts: In Part One, for regression problems with grouped covariates, we adopt the idea of sparse group lasso (Friedman et al., 2010) to the framework of the sufficient dimension reduction. We propose a method called the sparse group sufficient dimension reduction (sgSDR) to conduct group and within group variable selections simultaneously without assuming a specific model structure on the regression function. Simulation studies show that our method is comparable to the sparse group lasso under the regular linear model setting, and outperforms sparse group lasso with higher true positive rates and substantially lower false positive rates when the regression function is nonlinear or (and) the error distributions are non-Gaussian. One immediate application of our method is to the gene pathway data analysis where genes naturally fall into groups (pathways). An analysis of a glioblastoma microarray data is included for illustration of our method. In Part Two, for many-valued or continuous Y , the standard practice of replacing the response Y by a discrete version of Y usually results in the loss of power due to the ignorance of intra-slice information. Most of the existing slicing methods highly reply on the selection of the number of slices h. Zhu et al. (2010) proposed a method called the cumulative slicing estimation (CUME) which avoids the otherwise subjective selection of h. In this dissertation, we revisit CUME from a different perspective to gain more insights, and then refine its performance by incorporating the intra-slice covariances. The resulting new method, which we call the covariance cumulative slicing estimation (COCUM), is comparable to CUME when the predictors are normally distributed, and outperforms CUME when the predictors are non-Gaussian, especially in the existence of outliers. The asymptotic results of COCUM are also well proved. --Abstract, page iv

    Gaussian Variational Estimation for Multidimensional Item Response Theory

    Full text link
    Multidimensional Item Response Theory (MIRT) is widely used in assessment and evaluation of educational and psychological tests. It models the individual response patterns by specifying functional relationship between individuals' multiple latent traits and their responses to test items. One major challenge in parameter estimation in MIRT is that the likelihood involves intractable multidimensional integrals due to latent variable structure. Various methods have been proposed that either involve direct numerical approximations to the integrals or Monte Carlo simulations. However, these methods have some limitations in that they are computationally demanding in high dimensions and rely on sampling from a posterior distribution. In the second chapter of the thesis, we propose a new Gaussian Variational EM (GVEM) algorithm which adopts a variational inference to approximate the intractable marginal likelihood by a computationally feasible lower bound. The optimal choice of variational lower bound allows us to derive closed-form updates in EM procedure, which makes the algorithm efficient and easily scale to high dimensions. We illustrate that the proposed algorithm can also be applied to assess the dimensionality of the latent traits in an exploratory analysis. Simulation studies and real data analysis are presented to demonstrate the computational efficiency and estimation precision of the GVEM algorithm in comparison to the popular alternative Metropolis-Hastings Robbins-Monro algorithm. In addition, theoretical guarantees are derived to establish the consistency of the estimator from the proposed GVEM algorithm. One of the key elements in MIRT is the relationship between the items and the latent traits, so-called a test structure. The correct specification of this relationship is crucial for accurate assessment of individuals. Hence, it is of interest to study how to accurately estimate the test structure from data. In the third chapter, we propose to apply GVEM to solve a latent variable selection problem for MIRT and empirically estimate the test structure. The main idea is to impose L1-type penalty to the variational lower bound of the likelihood to recover a simple test structure in iterative procedures. Simulation studies show that the proposed method accurately estimates the test structure and is computationally efficient. A real data analysis on the large-scale assessment test called National Education Longitudinal Study of 1988 is presented. In the last chapter, we discuss some of the interesting extensions of our proposed method. The first extension is to develop the estimation method via GVEM procedures for the Multidimensional 4-Parameter Logistic model, which is known to be more challenging than previously discussed MIRT models. The second extension is to study Differential Item Functioning (DIF) analysis in MIRT. In brief, DIF occurs when groups (such as defined by gender, ethnicity, or education) have different probabilities of responses for a given test item even though people have the same latent abilities. Our goal is to identify test items that have DIF. We formulate the DIF analysis in MIRT as the regularization problem and solve it via our proposed GVEM approach. Simulation studies are presented to show the performance of our proposed method on these topics.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162939/1/aprilcho_1.pd
    corecore