318 research outputs found

    Average treatment effect estimation via random recursive partitioning

    Full text link
    A new matching method is proposed for the estimation of the average treatment effect of social policy interventions (e.g., training programs or health care measures). Given an outcome variable, a treatment and a set of pre-treatment covariates, the method is based on the examination of random recursive partitions of the space of covariates using regression trees. A regression tree is grown either on the treated or on the untreated individuals {\it only} using as response variable a random permutation of the indexes 1...nn (nn being the number of units involved), while the indexes for the other group are predicted using this tree. The procedure is replicated in order to rule out the effect of specific permutations. The average treatment effect is estimated in each tree by matching treated and untreated in the same terminal nodes. The final estimator of the average treatment effect is obtained by averaging on all the trees grown. The method does not require any specific model assumption apart from the tree's complexity, which does not affect the estimator though. We show that this method is either an instrument to check whether two samples can be matched (by any method) and, when this is feasible, to obtain reliable estimates of the average treatment effect. We further propose a graphical tool to inspect the quality of the match. The method has been applied to the National Supported Work Demonstration data, previously analyzed by Lalonde (1986) and others

    Invariant and Metric Free Proximities for Data Matching: An R Package

    Get PDF
    Data matching is a typical statistical problem in non experimental and/or observational studies or, more generally, in cross-sectional studies in which one or more data sets are to be compared. Several methods are available in the literature, most of which based on a particular metric or on statistical models, either parametric or nonparametric. In this paper we present two methods to calculate a proximity which have the property of being invariant under monotonic transformations. These methods require at most the notion of ordering. An open-source software in the form of a R package is also presented.

    cem: Software for Coarsened Exact Matching

    Get PDF
    This program is designed to improve causal inference via a method of matching that is widely applicable in observational data and easy to understand and use (if you understand how to draw a histogram, you will understand this method). The program implements the coarsened exact matching (CEM) algorithm, described below. CEM may be used alone or in combination with any existing matching method. This algorithm, and its statistical properties, are described in Iacus, King, and Porro (2008).

    Measuring Social Well Being in The Big Data Era: Asking or Listening?

    Full text link
    The literature on well being measurement seems to suggest that "asking" for a self-evaluation is the only way to estimate a complete and reliable measure of well being. At the same time "not asking" is the only way to avoid biased evaluations due to self-reporting. Here we propose a method for estimating the welfare perception of a community simply "listening" to the conversations on Social Network Sites. The Social Well Being Index (SWBI) and its components are proposed through to an innovative technique of supervised sentiment analysis called iSA which scales to any language and big data. As main methodological advantages, this approach can estimate several aspects of social well being directly from self-declared perceptions, instead of approximating it through objective (but partial) quantitative variables like GDP; moreover self-perceptions of welfare are spontaneous and not obtained as answers to explicit questions that are proved to bias the result. As an application we evaluate the SWBI in Italy through the period 2012-2015 through the analysis of more than 143 millions of tweets.Comment: 40 pages, 2 figures. arXiv admin note: text overlap with arXiv:1512.0156

    Social networks, happiness and health: from sentiment analysis to a multidimensional indicator of subjective well-being

    Full text link
    This paper applies a novel technique of opinion analysis over social media data with the aim of proposing a new indicator of perceived and subjective well-being. This new index, namely SWBI, examines several dimension of individual and social life. The indicator has been compared to some other existing indexes of well-being and health conditions in Italy: the BES (Benessere Equo Sostenibile), the incidence rate of influenza and the abundance of PM10 in urban environments. SWBI is a daily measure available at province level. BES data, currently available only for 2013 and 2014, are annual and available at regional level. Flu data are weekly and distributed as regional data and PM10 are collected daily for different cities. Due to the fact that the time scale and space granularity of the different indexes varies, we apply a novel statistical technique to discover nowcasting features and the classical latent analysis to study the relationships among them. A preliminary analysis suggest that the environmental and health conditions anticipate several dimensions of the perception of well-being as measured by SWBI. Moreover, the set of indicators included in the BES represent a latent dimension of well-being which shares similarities with the latent dimension represented by SWBI.Comment: 26 pages, 5 figur

    CEM: Coarsened Exact Matching in Stata

    Get PDF
    We introduce a Stata implementation of coarsened exact matching, a new method for improving the estimation of causal effects by reducing imbalance in covariates between treated and control groups. Coarsened exact matching is faster, is easier to use and understand, requires fewer assumptions, is more easily automated, and possesses more attractive statistical properties for many applications than do existing matching methods. In coarsened exact matching, users temporarily coarsen their data, exact match on these coarsened data, and then run their analysis on the uncoarsened, matched data. Coarsened exact matching bounds the degree of model dependence and causal effect estimation error by ex ante user choice, is monotonic imbalance bounding (so that reducing the maximum imbalance on one variable has no effect on others), does not require a separate procedure to restrict data to common support, meets the congruence principle, is approximately invariant to measurement error, balances all nonlinearities and interactions in sample (i.e., not merely in expectation), and works with multiply imputed datasets. Other matching methods inherit many of the coarsened exact matching method’s properties when applied to further match data preprocessed by coarsened exact matching.

    Invariant and Metric Free Proximities for Data Matching: An R Package

    Get PDF
    Data matching is a typical statistical problem in non experimental and/or observational studies or, more generally, in cross-sectional studies in which one or more data sets are to be compared. Several methods are available in the literature, most of which based on a particular metric or on statistical models, either parametric or nonparametric. In this paper we present two methods to calculate a proximity which have the property of being invariant under monotonic transformations. These methods require at most the notion of ordering. An open-source software in the form of a R package is also presented

    cem: Coarsened Exact Matching in Stata

    Get PDF
    This paper introduces a Stata implementation of Coarsened Exact Matching (CEM), a new method for improving the estimation of causal effects by reducing imbalance in co-variates between treated and control groups. CEM is faster, easier to use and understand, requires fewer assumptions, more easily automated, and possesses more attractive statistical properties for many applications than existing matching methods. In CEM, users temporarily coarsen their data, exact match on these coarsened data, then run their analysis on the uncoarsened, matched data. CEM bounds the degree of model dependence and causal effect estimation error by ex ante user choice, is montonic imbalance bounding (so that reducing the maximum imbalance on one variable has no e ect on others), does not require a separate procedure to restrict data to common support, meets the congruence principle, is approximately invariant to measurement error, balances all nonlinearities and interactions in-sample (i.e., not merely in expectation), and works with multiply imputed data sets. Other matching methods inheret [sic] many of CEM's properties when applied to further match data preprocessed by CEM. The library cem implements the CEM algorithm in Stata.Governmen

    A proposal to deal with sampling bias in social network big data

    Get PDF
    [EN] Selection bias is the bias introduced by the non random selection of data, it leads to question whether the sample obtained is representative of the target population. Generally there are different types of selection bias, but when one manages web-surveys or data from social network as Twitter or Facebook, one mostly need to focus with sampling and self-selection bias. In this work we propose to use offcial statistics to anchor and remove the sampling bias and unreliability of the estimations, due to the use of social network big data, following a weighting method combined with a small area estimations (SAE) approach.Iacus, SM.; Porro, G.; Salini, S.; Siletti, E. (2018). A proposal to deal with sampling bias in social network big data. En 2nd International Conference on Advanced Reserach Methods and Analytics (CARMA 2018). Editorial Universitat Politècnica de València. 29-37. https://doi.org/10.4995/CARMA2018.2018.8302OCS293
    • …
    corecore