3,704 research outputs found

    About Time

    Get PDF
    Survival analysis is a method of analysis used to study event occurrence. Missing periods in discrete-time survival analyses are problematic, since whether an event occurs determines whether the subject is followed up upon. Seven strategies that can be used when missingness occurs (case deletion, deletion upon missing, single imputation, multiple imputation, remembrance, the Non-Event-Strategy and the Event-Strategy) are evaluated using four criteria: effect size bias, standard error bias, power and coverage rate of confidence intervals. Single imputation, multiple imputation and the Non-Event Strategy show good results. Single imputation performs slightly better, yet the Non-Event Strategy is easier to implement

    The Classical Linear Regression Model with one Incomplete Binary Variable

    Get PDF
    We present three different methods based on the conditional mean imputation when binary explanatory variables are incomplete. Apart from the single imputation and multiple imputation especially the so-called pi imputation is presented as a new procedure. Seven procedures are compared in a simulation experiment when missing data are confined to one independent binary variable: complete case analysis, zero order regression, categorical zero order regression, pi imputation, single imputation, multiple imputation, modified first order regression. After a brief theoretical description of the simulation experiment, MSE-ratio, variance and bias are used to illustrate differences within and between the approaches

    A reinforcement learning-based approach for imputing missing data

    Get PDF
    Missing data is a major problem in real-world datasets, which hinders the performance of data analytics. Conventional data imputation schemes such as univariate single imputation replace missing values in each column with the same approximated value. These univariate single imputation techniques underestimate the variance of the imputed values. On the other hand, multivariate imputation explores the relationships between different columns of data, to impute the missing values. Reinforcement Learning (RL) is a machine learning paradigm where the agent learns by taking actions and receiving rewards in response, to achieve its goal. In this work, we propose an RL-based approach to impute missing data by learning a policy to impute data through an action-reward-based experience. Our approach imputes missing values in a column by working only on the same column (similar to univariate single imputation) but imputes the missing values in the column with different values thus keeping the variance in the imputed values. We report superior performance of our approach, compared with other imputation techniques, on a number of datasets

    Consistency of Hedonic Price Indexes with Unobserved Characteristics

    Get PDF
    Hedonic regressions are prone to omitted variable bias. The estimation of price relatives for new and disappearing goods using hedonic imputation methods involves taking ratios of hedonic models. This may lead to a situation where the omitted variable bias in each of the hedonic regressions offset each other. This study finds that the single imputation hedonic method estimates inconsistent price relatives, while the double imputation method may produce consistent price relatives depending on the behavior of unobserved characteristics in the comparison periods. The study outlines a methodology to test whether double imputation price relatives are consistent. The results of this study have implications with regard to the construction of quality adjusted indexes.Hedonic imputation method; omitted variable bias; model selection; quality adjusted price indexes; new and disappearing goods

    Multiple imputation of right-censored wages in the German IAB Employment Sample considering heteroscedasticity

    Get PDF
    "In many large data sets of economic interest, some variables, as wages, are top-coded or right-censored. In order to analyze wages with the German IAB employment sample we first have to solve the problem of censored wages at the upper limit of the social security system. We treat this problem as a missing data problem and derive new multiple imputation approaches to impute the censored wages by draws of a random variable from a truncated distribution based on Markov chain Monte Carlo techniques. In general, the variation of income is smaller in lower wage categories than in higher categories and the assumption of homoscedasticity in an imputation model is highly questionable. Therefore, we suggest a new multiple imputation method which does not presume homoscedasticity of the residuals. Finally, in a simulation study, different imputation approaches are compared under different situations and the necessity as well as the validity of the new approach is confirmed." (Author's abstract, IAB-Doku) ((en))Lohnhöhe, Daten, Datenaufbereitung - Methode, angewandte Statistik, mathematische Statistik, Schätzung, Markov-Ketten, Monte-Carlo-Methode, IAB-Beschäftigtenstichprobe, Imputationsverfahren, Westdeutschland, Bundesrepublik Deutschland
    • …
    corecore