246 research outputs found

    H-relative error estimation approach for multiplicative regression model with random effect

    Full text link
    Relative error approaches are more of concern compared to absolute error ones such as the least square and least absolute deviation, when it needs scale invariant of output variable, for example with analyzing stock and survival data. An h-relative error estimation method via the h-likelihood is developed to avoid heavy and intractable integration for a multiplicative regression model with random effect. Statistical properties of the parameters and random effect in the model are studied. To estimate the parameters, we propose an h-relative error computation procedure. Numerical studies including simulation and real examples show the proposed method performs well.Comment: 16 page

    Evaluating the diagnostic powers of variables and their linear combinations when the gold standard is continuous

    Full text link
    The receiver operating characteristic (ROC) curve is a very useful tool for analyzing the diagnostic/classification power of instruments/classification schemes as long as a binary-scale gold standard is available. When the gold standard is continuous and there is no confirmative threshold, ROC curve becomes less useful. Hence, there are several extensions proposed for evaluating the diagnostic potential of variables of interest. However, due to the computational difficulties of these nonparametric based extensions, they are not easy to be used for finding the optimal combination of variables to improve the individual diagnostic power. Therefore, we propose a new measure, which extends the AUC index for identifying variables with good potential to be used in a diagnostic scheme. In addition, we propose a threshold gradient descent based algorithm for finding the best linear combination of variables that maximizes this new measure, which is applicable even when the number of variables is huge. The estimate of the proposed index and its asymptotic property are studied. The performance of the proposed method is illustrated using both synthesized and real data sets

    Extended T-process Regression Models

    Full text link
    Gaussian process regression (GPR) model has been widely used to fit data when the regression function is unknown and its nice properties have been well established. In this article, we introduce an extended t-process regression (eTPR) model, which gives a robust best linear unbiased predictor (BLUP). Owing to its succinct construction, it inherits many attractive properties from the GPR model, such as having closed forms of marginal and predictive distributions to give an explicit form for robust BLUP procedures, and easy to cope with large dimensional covariates with an efficient implementation by slightly modifying existing BLUP procedures. Properties of the robust BLUP are studied. Simulation studies and real data applications show that the eTPR model gives a robust fit in the presence of outliers in both input and output spaces and has a good performance in prediction, compared with the GPR and locally weighted scatterplot smoothing (LOESS) methods

    Distributed sequential method for analyzing massive data

    Full text link
    To analyse a very large data set containing lengthy variables, we adopt a sequential estimation idea and propose a parallel divide-and-conquer method. We conduct several conventional sequential estimation procedures separately, and properly integrate their results while maintaining the desired statistical properties. Additionally, using a criterion from the statistical experiment design, we adopt an adaptive sample selection, together with an adaptive shrinkage estimation method, to simultaneously accelerate the estimation procedure and identify the effective variables. We confirm the cogency of our methods through theoretical justifications and numerical results derived from synthesized data sets. We then apply the proposed method to three real data sets, including those pertaining to appliance energy use and particulate matter concentration

    A robust estimation for the extended t-process regression model

    Full text link
    Robust estimation and variable selection procedure are developed for the extended t-process regression model with functional data. Statistical properties such as consistency of estimators and predictions are obtained. Numerical studies show that the proposed method performs well.Comment: 20 page

    Modeling Function-Valued Processes with Nonseparable and/or Nonstationary Covariance Structure

    Full text link
    We discuss a general Bayesian framework on modeling multidimensional function-valued processes by using a Gaussian process or a heavy-tailed process as a prior, enabling us to handle nonseparable and/or nonstationary covariance structure. The nonstationarity is introduced by a convolution-based approach through a varying anisotropy matrix, whose parameters vary along the input space and are estimated via a local empirical Bayesian method. For the varying matrix, we propose to use a spherical parametrization, leading to unconstrained and interpretable parameters. The unconstrained nature allows the parameters to be modeled as a nonparametric function of time, spatial location or other covariates. The interpretation of the parameters is based on closed-form expressions, providing valuable insights into nonseparable covariance structures. Furthermore, to extract important information in data with complex covariance structure, the Bayesian framework can decompose the function-valued processes using the eigenvalues and eigensurfaces calculated from the estimated covariance structure. The results are demonstrated by simulation studies and by an application to wind intensity data. Supplementary materials for this article are available online.Comment: Added subsection 2.2.1: Local Interpretation of the Varying Anisotropy Matrix; Replaced simulation studies; Replaced application by two new ones; Corrected typo

    Least Product Relative Error Estimation

    Full text link
    A least product relative error criterion is proposed for multiplicative regression models. It is invariant under scale transformation of the outcome and covariates. In addition, the objective function is smooth and convex, resulting in a simple and uniquely defined estimator of the regression parameter. It is shown that the estimator is asymptotically normal and that the simple plugging-in variance estimation is valid. Simulation results confirm that the proposed method performs well. An application to body fat calculation is presented to illustrate the new method

    Sequential estimation for GEE with adaptive variables and subject selection

    Full text link
    Modeling correlated or highly stratified multiple-response data becomes a common data analysis task due to modern data monitoring facilities and methods. Generalized estimating equations (GEE) is one of the popular statistical methods for analyzing this kind of data. In this paper, we present a sequential estimation procedure for obtaining GEE-based estimates. In addition to the conventional random sampling, the proposed method features adaptive subject recruiting and variable selection. Moreover, we equip our method with an adaptive shrinkage property so that it can decide the effective variables during the estimation procedure and build a confidence set with a pre-specified precision for the corresponding parameters. In addition to the statistical properties of the proposed procedure, we assess our method using both simulated data and real data sets.Comment:

    Nearly Semiparametric Efficient Estimation of Quantile Regression

    Full text link
    As a competitive alternative to least squares regression, quantile regression is popular in analyzing heterogenous data. For quantile regression model specified for one single quantile level Ï„\tau, major difficulties of semiparametric efficient estimation are the unavailability of a parametric efficient score and the conditional density estimation. In this paper, with the help of the least favorable submodel technique, we first derive the semiparametric efficient scores for linear quantile regression models that are assumed for a single quantile level, multiple quantile levels and all the quantile levels in (0,1)(0,1) respectively. Our main discovery is a one-step (nearly) semiparametric efficient estimation for the regression coefficients of the quantile regression models assumed for multiple quantile levels, which has several advantages: it could be regarded as an optimal way to pool information across multiple/other quantiles for efficiency gain; it is computationally feasible and easy to implement, as the initial estimator is easily available; due to the nature of quantile regression models under investigation, the conditional density estimation is straightforward by plugging in an initial estimator. The resulting estimator is proved to achieve the corresponding semiparametric efficiency lower bound under regularity conditions. Numerical studies including simulations and an example of birth weight of children confirms that the proposed estimator leads to higher efficiency compared with the Koenker-Bassett quantile regression estimator for all quantiles of interest.Comment: 33 page

    Active learning for binary classification with variable selection

    Full text link
    Modern computing and communication technologies can make data collection procedures very efficient. However, our ability to analyze large data sets and/or to extract information out from them is hard-pressed to keep up with our capacities for data collection. Among these huge data sets, some of them are not collected for any particular research purpose. For a classification problem, this means that the essential label information may not be readily obtainable, in the data set in hands, and an extra labeling procedure is required such that we can have enough label information to be used for constructing a classification model. When the size of a data set is huge, to label each subject in it will cost a lot in both capital and time. Thus, it is an important issue to decide which subjects should be labeled first in order to efficiently reduce the training cost/time. Active learning method is a promising outlet for this situation, because with the active learning ideas, we can select the unlabeled subjects sequentially without knowing their label information. In addition, there will be no confirmed information about the essential variables for constructing an efficient classification rule. Thus, how to merge a variable selection scheme with an active learning procedure is of interest. In this paper, we propose a procedure for building binary classification models when the complete label information is not available in the beginning of the training stage. We study an model-based active learning procedure with sequential variable selection schemes, and discuss the results of the proposed procedure from both theoretical and numerical aspects.Comment: 16 pages, 1 figur
    • …