7,831 research outputs found

    Revisiting Guerry's data: Introducing spatial constraints in multivariate analysis

    Full text link
    Standard multivariate analysis methods aim to identify and summarize the main structures in large data sets containing the description of a number of observations by several variables. In many cases, spatial information is also available for each observation, so that a map can be associated to the multivariate data set. Two main objectives are relevant in the analysis of spatial multivariate data: summarizing covariation structures and identifying spatial patterns. In practice, achieving both goals simultaneously is a statistical challenge, and a range of methods have been developed that offer trade-offs between these two objectives. In an applied context, this methodological question has been and remains a major issue in community ecology, where species assemblages (i.e., covariation between species abundances) are often driven by spatial processes (and thus exhibit spatial patterns). In this paper we review a variety of methods developed in community ecology to investigate multivariate spatial patterns. We present different ways of incorporating spatial constraints in multivariate analysis and illustrate these different approaches using the famous data set on moral statistics in France published by Andr\'{e}-Michel Guerry in 1833. We discuss and compare the properties of these different approaches both from a practical and theoretical viewpoint.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS356 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data Mining

    Full text link
    Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Preprocessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Sequentialization. Indeed, it is important to emphasize right away that the so-called no free lunch theorem applies here, in the sense that there is no universally superior method that outperforms all other methods on all categories of bigness. It is also important to stress the fact that simplicity in the sense of Ockham's razor non plurality principle of parsimony tends to reign supreme when it comes to massive data. We conclude with a comparison of the predictive performance of some of the most commonly used methods on a few data sets.Comment: 18 pages, 2 figures 3 table

    Evaluating Nonexperimental Estimators for Multiple Treatments: Evidence from Experimental Data

    Get PDF
    This paper assesses the e¤ectiveness of unconfoundedness-based estimators of mean e¤ects for multiple or multivalued treatments in eliminating biases arising from nonrandom treatment assignment. We evaluate these multiple treatment estimators by simultaneously equalizing average outcomes among several control groups from a randomized experiment. We study linear regression estimators as well as partial mean and weighting estimators based on the generalized propensity score (GPS). We also study the use of the GPS in assessing the comparability of individuals among the di¤erent treatment groups, and propose a strategy to determine the overlap or common support region that is less stringent than those previously used in the literature. Our results show that in the multiple treatment setting there may be treatment groups for which it is extremely di¢ cult to ?nd valid comparison groups, and that the GPS plays a signi?cant role in identifying those groups. In such situations, the estimators we consider perform poorly. However, their performance improves considerably once attention is restricted to those treatment groups with adequate overlap quality, with di¤erence-in-di¤erence estimators performing the best. Our results suggest that unconfoundedness-based estimators are a valuable econometric tool for evaluating multiple treatments, as long as the overlap quality is satisfactory.

    Evaluating Nonexperimental Estimators for Multiple Treatments: Evidence from Experimental Data

    Get PDF
    This paper assesses the effectiveness of unconfoundedness-based estimators of mean effects for multiple or multivalued treatments in eliminating biases arising from nonrandom treatment assignment. We evaluate these multiple treatment estimators by simultaneously equalizing average outcomes among several control groups from a randomized experiment. We study linear regression estimators as well as partial mean and weighting estimators based on the generalized propensity score (GPS). We also study the use of the GPS in assessing the comparability of individuals among the different treatment groups, and propose a strategy to determine the overlap or common support region that is less stringent than those previously used in the literature. Our results show that in the multiple treatment setting there may be treatment groups for which it is extremely difficult to find valid comparison groups, and that the GPS plays a significant role in identifying those groups. In such situations, the estimators we consider perform poorly. However, their performance improves considerably once attention is restricted to those treatment groups with adequate overlap quality, with difference-in-difference estimators performing the best. Our results suggest that unconfoundedness-based estimators are a valuable econometric tool for evaluating multiple treatments, as long as the overlap quality is satisfactory.multiple treatments, nonexperimental estimators, generalized propensity score

    Peer Effects in the Workplace: Evidence from Random Groupings in Professional Golf Tournaments

    Get PDF
    This paper uses the random assignment of playing partners in professional golf tournaments to test for peer effects in the workplace. We find no evidence that the ability of playing partners affects the performance of professional golfers, contrary to recent evidence on peer effects in the workplace from laboratory experiments, grocery scanners, and soft-fruit pickers. In our preferred specification, we can rule out peer effects larger than 0.045 strokes for a one stroke increase in playing partners' ability, and the point estimates are small and actually negative. We offer several explanations for our contrasting findings: that workers seek to avoid responding to social incentives when financial incentives are strong; that there is heterogeneity in how susceptible individuals are to social effects and that those who are able to avoid them are more likely to advance to elite professional labor markets; and that workers learn with professional experience not to be affected by social forces. We view our results as complementary to the existing studies of peer effects in the workplace and as a first step towards explaining how these social effects vary across labor markets, across individuals and with changes in the form of incentives faced. In addition to the empirical results on peer effects in the workplace, we also point out that many typical peer effects regressions are biased because individuals cannot be their own peers, and suggest a simple correction.

    Temporal-varying failures of nodes in networks

    Full text link
    We consider networks in which random walkers are removed because of the failure of specific nodes. We interpret the rate of loss as a measure of the importance of nodes, a notion we denote as failure-centrality. We show that the degree of the node is not sufficient to determine this measure and that, in a first approximation, the shortest loops through the node have to be taken into account. We propose approximations of the failure-centrality which are valid for temporal-varying failures and we dwell on the possibility of externally changing the relative importance of nodes in a given network, by exploiting the interference between the loops of a node and the cycles of the temporal pattern of failures. In the limit of long failure cycles we show analytically that the escape in a node is larger than the one estimated from a stochastic failure with the same failure probability. We test our general formalism in two real-world networks (air-transportation and e-mail users) and show how communities lead to deviations from predictions for failures in hubs.Comment: 7 pages, 3 figure

    Evaluating the methodology of social experiments

    Get PDF
    Welfare ; Econometric models

    Indirect effects of an aid program: how do liquidity injections affect non-eligibles' consumption?

    Get PDF
    Aid programs in developing countries are likely to affect both the treated and the non-treated households living in the targeted areas. Studies that focus on the treatment effecton the treated may fail to capture important spillover effects. We exploit the unique designof an aid program's experimental trial to identify its indirect effect on consumption for non-eligible households living in treated areas. We find that this effect is positive, and that itoccurs through changes in the insurance and credit markets: non-eligible households receivemore transfers, and borrow more when hit by a negative idiosyncratic shock, because of theprogram liquidity injection; thus they can reduce their precautionary savings. We also testfor general equilibrium effects in the local labor and goods markets; we find no significantchanges in labor income and prices, while there is a reduction in earnings from sales ofagricultural products, which are now consumed rather than sold. We show that this classof aid programs has important positive externalities; thus their overall effect is larger thanthe effect on the treated. Our results confirm that a key identifying assumption - that thetreatment has no effect on the non-treated - is likely to be violated in similar policy designs. Aid programs in developing countries are likely to affect both the treated and the non-treated households living in the targeted areas. Studies that focus on the treatment effecton the treated may fail to capture important spillover effects. We exploit the unique designof an aid program's experimental trial to identify its indirect effect on consumption for non-eligible households living in treated areas. We find that this effect is positive, and that itoccurs through changes in the insurance and credit markets: non-eligible households receivemore transfers, and borrow more when hit by a negative idiosyncratic shock, because of theprogram liquidity injection; thus they can reduce their precautionary savings. We also testfor general equilibrium effects in the local labor and goods markets; we find no significantchanges in labor income and prices, while there is a reduction in earnings from sales ofagricultural products, which are now consumed rather than sold. We show that this classof aid programs has important positive externalities; thus their overall effect is larger thanthe effect on the treated. Our results confirm that a key identifying assumption - that thetreatment has no effect on the non-treated - is likely to be violated in similar policy designs

    On the one-dimensional cubic nonlinear Schrodinger equation below L^2

    Get PDF
    In this paper, we review several recent results concerning well-posedness of the one-dimensional, cubic Nonlinear Schrodinger equation (NLS) on the real line R and on the circle T for solutions below the L^2-threshold. We point out common results for NLS on R and the so-called "Wick ordered NLS" (WNLS) on T, suggesting that WNLS may be an appropriate model for the study of solutions below L^2(T). In particular, in contrast with a recent result of Molinet who proved that the solution map for the periodic cubic NLS equation is not weakly continuous from L^2(T) to the space of distributions, we show that this is not the case for WNLS.Comment: 14 pages, additional reference
    corecore