8,417 research outputs found

    MIDAS: A SAS Macro for Multiple Imputation Using Distance-Aided Selection of Donors

    Get PDF
    In this paper we describe MIDAS: a SAS macro for multiple imputation using distance aided selection of donors which implements an iterative predictive mean matching hot-deck for imputing missing data. This is a flexible multiple imputation approach that can handle data in a variety of formats: continuous, ordinal, and scaled. Because the imputation models are implicit, it is not necessary to specify a parametric distribution for each variable to be imputed. MIDAS also allows the user to address the sensitivity of their inferences to different assumptions concerning the missing data mechanism. An example using MIDAS to impute missing data is presented and MIDAS is compared to existing missing data software.

    Techniques for clustering gene expression data

    Get PDF
    Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered

    Empirical econometric evaluation of alternative methods of dealing with missing values in investment climate surveys

    Get PDF
    Investment climate Surveys are valuable instruments that improve our understanding of the economic, social, political, and institutional factors determining economic growth, particularly in emerging and transition economies. However, at the same time, they have to overcome some difficult issues related to the quality of the information provided; measurement errors, outlier observations, and missing data that are frequently found in these datasets. This paper discusses the applicability of recent procedures to deal with missing observations in investment climate surveys. In particular, it presents a simple replacement mechanism -- for application in models with a large number of explanatory variables -- which in turn is a proxy of two methods: multiple imputations and an export-import algorithm. The performance of this method in the context of total factor productivity estimation in extended production functions is evaluated using investment climate surveys from four countries: India, South Africa, Tanzania, and Turkey. It is shown that the method is very robust and performs reasonably well even under different assumptions on the nature of the mechanism generating missing data.E-Business,Statistical&Mathematical Sciences,Economic Theory&Research,Information Security&Privacy,Information and Records Management

    Decision Tree and Random Forest Methodology for Clustered and Longitudinal Binary Outcomes

    Get PDF
    Clustered binary outcomes are frequently encountered in medical research (e.g. longitudinal studies). Generalized linear mixed models (GLMMs) typically employed for clustered endpoints have challenges for some scenarios (e.g. high dimensional data). In the first dissertation aim, we develop an alternative, data-driven method called Binary Mixed Model (BiMM) tree, which combines decision tree and GLMM. We propose a procedure akin to the expectation maximization algorithm, which iterates between developing a classification and regression tree using all predictors and developing a GLMM which includes indicator variables for terminal nodes from the tree as predictors along with a random effect for the clustering variable. Since prediction accuracy may be increased through ensemble methods, we extend BiMM tree methodology within the random forest setting in the second dissertation aim. BiMM forest combines random forest and GLMM within a unified framework using an algorithmic procedure which iterates between developing a random forest and using the predicted probabilities of observations from the random forest within a GLMM that contains a random effect for the clustering variable. Simulation studies show that BiMM tree and BiMM forest methodology offer similar or superior prediction accuracy compared to standard classification and regression tree, random forest and GLMM for clustered binary outcomes. The new BiMM methods are used to develop prediction models within the acute liver failure setting using the first seven days of hospital data for the third dissertation aim. Acute liver failure is a rare and devastating condition characterized by rapid onset of severe liver damage. The majority of prediction models developed for acute liver failure patients use admission data only, even though many clinical and laboratory variables are collected daily. The novel BiMM tree and forest methodology developed in this dissertation can be used in diverse research settings to provide highly accurate and efficient prediction models for clustered and longitudinal binary outcomes
    corecore