9,193 research outputs found

    Fixed Effect Estimation of Large T Panel Data Models

    Get PDF
    This article reviews recent advances in fixed effect estimation of panel data models for long panels, where the number of time periods is relatively large. We focus on semiparametric models with unobserved individual and time effects, where the distribution of the outcome variable conditional on covariates and unobserved effects is specified parametrically, while the distribution of the unobserved effects is left unrestricted. Compared to existing reviews on long panels (Arellano and Hahn 2007; a section in Arellano and Bonhomme 2011) we discuss models with both individual and time effects, split-panel Jackknife bias corrections, unbalanced panels, distribution and quantile effects, and other extensions. Understanding and correcting the incidental parameter bias caused by the estimation of many fixed effects is our main focus, and the unifying theme is that the order of this bias is given by the simple formula p/n for all models discussed, with p the number of estimated parameters and n the total sample size.Comment: 40 pages, 1 tabl

    Bayesian Approach of Joint Models of Longitudinal Outcomes and Informative Time

    Get PDF
    Longitudinal studies are commonly encountered in a variety of research areas in which the scientific interest is in the pattern of change in a response variable over time. In longitudinal data analyses, a number of methods have been proposed. Most of the traditional longitudinal methods assume that the independent variables are the same across all subjects. It is commonly assumed that time intervals for collecting outcomes are predetermined and have no information regarding the measured variables. However, in practice, researchers might occasionally have irregular time intervals and informative time, which violate the above assumptions. Hence, if traditional statistical methods are used for this situation, the results would be biased. The joint models of longitudinal outcomes and informative time are used as a solution to the above violations by using joint probability distributions, incorporating the relationships between outcomes and time. The joint models are designed to handle outcome distributions from a normal distribution with informative time following an exponential distribution. Several studies used the maximum likelihood parameter estimates of the joint model. This study, however, presented an alternative method for parameters estimation, based on a Bayesian approach, with respect to joint models of longitudinal outcomes and informative time. Using a Bayesian approach permitted the inclusion of knowledge of the observed data within the analysis through the prior distribution of unknown parameters. In this dissertation, the prior distribution adopted three scenarios: (1) the prior distributions of all unknown parameters are noninformative prior, which will set to be vague but proper prior: Normal(0, 1e6). (2) The prior distributions of all unknown parameters are informative prior, which will be set to be normal for unrestricted parameters, and inverse gamma (IG) priors for positive parameters such as the variance σ2. (3) A combination of two above scenarios, so the prior distributions of some unknown parameters are noninformative, and the others are informative. The procedure for estimating the model parameters was developed via a Markov chain Monte Carlo method using the Metropolis-Hastings algorithm. The key idea was to construct the likelihood function, specify the prior information, and then calculate the posterior distribution. Simulated observations were generated by the MCMC technique from the posterior distribution. Thus, the primary purpose of this study was to find Bayesian estimates for the unknown parameters in the joint model, with the assumptions of a normal distribution for the outcome process and an exponential distribution for informative time. The properties and merits of the proposed procedure were illustrated employing a simulation study through a written R program and OpenBUGS

    Random Forests for Big Data

    Get PDF
    Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include online data and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision trees combined with aggregation and bootstrap ideas, random forests were introduced by Breiman in 2001. They are a powerful nonparametric statistical method allowing to consider in a single and versatile framework regression problems, as well as two-class and multi-class classification problems. Focusing on classification problems, this paper proposes a selective review of available proposals that deal with scaling random forests to Big Data problems. These proposals rely on parallel environments or on online adaptations of random forests. We also describe how related quantities -- such as out-of-bag error and variable importance -- are addressed in these methods. Then, we formulate various remarks for random forests in the Big Data context. Finally, we experiment five variants on two massive datasets (15 and 120 millions of observations), a simulated one as well as real world data. One variant relies on subsampling while three others are related to parallel implementations of random forests and involve either various adaptations of bootstrap to Big Data or to "divide-and-conquer" approaches. The fifth variant relates on online learning of random forests. These numerical experiments lead to highlight the relative performance of the different variants, as well as some of their limitations
    • …
    corecore