Search CORE

5,115 research outputs found

Empirical econometric evaluation of alternative methods of dealing with missing values in investment climate surveys

Author: Escribano Alvaro
Guasch J. Luis
Pena Jorge
Publication venue
Publication date
Field of study

Investment climate Surveys are valuable instruments that improve our understanding of the economic, social, political, and institutional factors determining economic growth, particularly in emerging and transition economies. However, at the same time, they have to overcome some difficult issues related to the quality of the information provided; measurement errors, outlier observations, and missing data that are frequently found in these datasets. This paper discusses the applicability of recent procedures to deal with missing observations in investment climate surveys. In particular, it presents a simple replacement mechanism -- for application in models with a large number of explanatory variables -- which in turn is a proxy of two methods: multiple imputations and an export-import algorithm. The performance of this method in the context of total factor productivity estimation in extended production functions is evaluated using investment climate surveys from four countries: India, South Africa, Tanzania, and Turkey. It is shown that the method is very robust and performs reasonably well even under different assumptions on the nature of the mechanism generating missing data.E-Business,Statistical&Mathematical Sciences,Economic Theory&Research,Information Security&Privacy,Information and Records Management

Research Papers in Economics

Techniques for clustering gene expression data

Author: Crane Martin
Doolan Padraig
Kerr Gráinne
Ruskin Heather J.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered

CiteSeerX

Irish Universities

DCU Online Research Access Service

Sensitivity of codispersion to noise and error in ecological and environmental data

Author: Acosta Jonathan
Buckley Hannah L
Case Bradley S
Ellison Aaron M
Vallejos Ronny
Publication venue: 'MDPI AG'
Publication date: 23/01/2018
Field of study

Codispersion analysis is a new statistical method developed to assess spatial covariation between two spatial processes that may not be isotropic or stationary. Its application to anisotropic ecological datasets have provided new insights into mechanisms underlying observed patterns of species distributions and the relationship between individual species and underlying environmental gradients. However, the performance of the codispersion coefficient when there is noise or measurement error ("contamination") in the data has been addressed only theoretically. Here, we use Monte Carlo simulations and real datasets to investigate the sensitivity of codispersion to four types of contamination commonly seen in many real-world environmental and ecological studies. Three of these involved examining codispersion of a spatial dataset with a contaminated version of itself. The fourth examined differences in codisperson between plants and soil conditions, where the estimates of soil characteristics were based on complete or thinned datasets. In all cases, we found that estimates of codispersion were robust when contamination, such as data thinning, was relatively low (<15\%), but were sensitive to larger percentages of contamination. We also present a useful method for imputing missing spatial data and discuss several aspects of the codispersion coefficient when applied to noisy data to gain more insight about the performance of codispersion in practice.Comment: 20 pages, 14 figure

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Understanding, Predicting and Mitigating Web Survey Breakoffs

Author: Chen Zeming
Publication venue
Publication date: 01/08/2023
Field of study

The University of Manchester - Institutional Repository

Decision Tree and Random Forest Methodology for Clustered and Longitudinal Binary Outcomes

Author: Speiser Jaime Lynn
Publication venue: 'Baishideng Publishing Group Inc.'
Publication date: 01/01/2017
Field of study

Clustered binary outcomes are frequently encountered in medical research (e.g. longitudinal studies). Generalized linear mixed models (GLMMs) typically employed for clustered endpoints have challenges for some scenarios (e.g. high dimensional data). In the first dissertation aim, we develop an alternative, data-driven method called Binary Mixed Model (BiMM) tree, which combines decision tree and GLMM. We propose a procedure akin to the expectation maximization algorithm, which iterates between developing a classification and regression tree using all predictors and developing a GLMM which includes indicator variables for terminal nodes from the tree as predictors along with a random effect for the clustering variable. Since prediction accuracy may be increased through ensemble methods, we extend BiMM tree methodology within the random forest setting in the second dissertation aim. BiMM forest combines random forest and GLMM within a unified framework using an algorithmic procedure which iterates between developing a random forest and using the predicted probabilities of observations from the random forest within a GLMM that contains a random effect for the clustering variable. Simulation studies show that BiMM tree and BiMM forest methodology offer similar or superior prediction accuracy compared to standard classification and regression tree, random forest and GLMM for clustered binary outcomes. The new BiMM methods are used to develop prediction models within the acute liver failure setting using the first seven days of hospital data for the third dissertation aim. Acute liver failure is a rare and devastating condition characterized by rapid onset of severe liver damage. The majority of prediction models developed for acute liver failure patients use admission data only, even though many clinical and laboratory variables are collected daily. The novel BiMM tree and forest methodology developed in this dissertation can be used in diverse research settings to provide highly accurate and efficient prediction models for clustered and longitudinal binary outcomes

MEDICA@MUSC (Medical University of South Carolina)

Advances in sub national measurement of the Human Development Index: The case of Mexico

Author: Hector Moreno
Rodolfo de la Torre
Publication venue
Publication date
Field of study

This paper surveys the main informational, conceptual and theoretical adjustments made to the HDI in the Mexican Human Development Reports and presents a way in which the calculation of the HDI could be carried out to the individual level. First, informational changes include redistributing government oil revenues from oil producing regions to the rest of the country in order to obtain a better picture of available resources and imputing per capita average household income to all municipalities combining census and income surveys. Also, state information is used to set counterfactuals about the first effects of internal migration on development, and municipal data is applied to decompose inequality indices to identify the sources and regions contributing to overall human development inequality. Second, conceptual adjustments consider introducing two additional dimensions to the HDI: being free from local crime and the absence of violence against women. Third, a key theoretical contribution from the Mexican National Reports to the HDI literature is the proposal of an inequality sensitive development index based on the concept of generalized means. Finally, the proposed disaggregation of the HDI at the household and individual level allows analyzing development levels for subgroups of population either by age, ethnic condition, sex and income or HDI deciles across time.Human Development Index, individual HDI, household HDI, inequality, migration, local crime, absence of violence against women, generalized means

Research Papers in Economics