5,115 research outputs found

    Empirical econometric evaluation of alternative methods of dealing with missing values in investment climate surveys

    Get PDF
    Investment climate Surveys are valuable instruments that improve our understanding of the economic, social, political, and institutional factors determining economic growth, particularly in emerging and transition economies. However, at the same time, they have to overcome some difficult issues related to the quality of the information provided; measurement errors, outlier observations, and missing data that are frequently found in these datasets. This paper discusses the applicability of recent procedures to deal with missing observations in investment climate surveys. In particular, it presents a simple replacement mechanism -- for application in models with a large number of explanatory variables -- which in turn is a proxy of two methods: multiple imputations and an export-import algorithm. The performance of this method in the context of total factor productivity estimation in extended production functions is evaluated using investment climate surveys from four countries: India, South Africa, Tanzania, and Turkey. It is shown that the method is very robust and performs reasonably well even under different assumptions on the nature of the mechanism generating missing data.E-Business,Statistical&Mathematical Sciences,Economic Theory&Research,Information Security&Privacy,Information and Records Management

    Techniques for clustering gene expression data

    Get PDF
    Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered

    Sensitivity of codispersion to noise and error in ecological and environmental data

    Full text link
    Codispersion analysis is a new statistical method developed to assess spatial covariation between two spatial processes that may not be isotropic or stationary. Its application to anisotropic ecological datasets have provided new insights into mechanisms underlying observed patterns of species distributions and the relationship between individual species and underlying environmental gradients. However, the performance of the codispersion coefficient when there is noise or measurement error ("contamination") in the data has been addressed only theoretically. Here, we use Monte Carlo simulations and real datasets to investigate the sensitivity of codispersion to four types of contamination commonly seen in many real-world environmental and ecological studies. Three of these involved examining codispersion of a spatial dataset with a contaminated version of itself. The fourth examined differences in codisperson between plants and soil conditions, where the estimates of soil characteristics were based on complete or thinned datasets. In all cases, we found that estimates of codispersion were robust when contamination, such as data thinning, was relatively low (<15\%), but were sensitive to larger percentages of contamination. We also present a useful method for imputing missing spatial data and discuss several aspects of the codispersion coefficient when applied to noisy data to gain more insight about the performance of codispersion in practice.Comment: 20 pages, 14 figure

    Decision Tree and Random Forest Methodology for Clustered and Longitudinal Binary Outcomes

    Get PDF
    Clustered binary outcomes are frequently encountered in medical research (e.g. longitudinal studies). Generalized linear mixed models (GLMMs) typically employed for clustered endpoints have challenges for some scenarios (e.g. high dimensional data). In the first dissertation aim, we develop an alternative, data-driven method called Binary Mixed Model (BiMM) tree, which combines decision tree and GLMM. We propose a procedure akin to the expectation maximization algorithm, which iterates between developing a classification and regression tree using all predictors and developing a GLMM which includes indicator variables for terminal nodes from the tree as predictors along with a random effect for the clustering variable. Since prediction accuracy may be increased through ensemble methods, we extend BiMM tree methodology within the random forest setting in the second dissertation aim. BiMM forest combines random forest and GLMM within a unified framework using an algorithmic procedure which iterates between developing a random forest and using the predicted probabilities of observations from the random forest within a GLMM that contains a random effect for the clustering variable. Simulation studies show that BiMM tree and BiMM forest methodology offer similar or superior prediction accuracy compared to standard classification and regression tree, random forest and GLMM for clustered binary outcomes. The new BiMM methods are used to develop prediction models within the acute liver failure setting using the first seven days of hospital data for the third dissertation aim. Acute liver failure is a rare and devastating condition characterized by rapid onset of severe liver damage. The majority of prediction models developed for acute liver failure patients use admission data only, even though many clinical and laboratory variables are collected daily. The novel BiMM tree and forest methodology developed in this dissertation can be used in diverse research settings to provide highly accurate and efficient prediction models for clustered and longitudinal binary outcomes

    Advances in sub national measurement of the Human Development Index: The case of Mexico

    Get PDF
    This paper surveys the main informational, conceptual and theoretical adjustments made to the HDI in the Mexican Human Development Reports and presents a way in which the calculation of the HDI could be carried out to the individual level. First, informational changes include redistributing government oil revenues from oil producing regions to the rest of the country in order to obtain a better picture of available resources and imputing per capita average household income to all municipalities combining census and income surveys. Also, state information is used to set counterfactuals about the first effects of internal migration on development, and municipal data is applied to decompose inequality indices to identify the sources and regions contributing to overall human development inequality. Second, conceptual adjustments consider introducing two additional dimensions to the HDI: being free from local crime and the absence of violence against women. Third, a key theoretical contribution from the Mexican National Reports to the HDI literature is the proposal of an inequality sensitive development index based on the concept of generalized means. Finally, the proposed disaggregation of the HDI at the household and individual level allows analyzing development levels for subgroups of population either by age, ethnic condition, sex and income or HDI deciles across time.Human Development Index, individual HDI, household HDI, inequality, migration, local crime, absence of violence against women, generalized means
    corecore