6,871 research outputs found

    Statistical Mechanics of High-Dimensional Inference

    Full text link
    To model modern large-scale datasets, we need efficient algorithms to infer a set of PP unknown model parameters from NN noisy measurements. What are fundamental limits on the accuracy of parameter inference, given finite signal-to-noise ratios, limited measurements, prior information, and computational tractability requirements? How can we combine prior information with measurements to achieve these limits? Classical statistics gives incisive answers to these questions as the measurement density α=NP\alpha = \frac{N}{P}\rightarrow \infty. However, these classical results are not relevant to modern high-dimensional inference problems, which instead occur at finite α\alpha. We formulate and analyze high-dimensional inference as a problem in the statistical physics of quenched disorder. Our analysis uncovers fundamental limits on the accuracy of inference in high dimensions, and reveals that widely cherished inference algorithms like maximum likelihood (ML) and maximum-a posteriori (MAP) inference cannot achieve these limits. We further find optimal, computationally tractable algorithms that can achieve these limits. Intriguingly, in high dimensions, these optimal algorithms become computationally simpler than MAP and ML, while still outperforming them. For example, such optimal algorithms can lead to as much as a 20% reduction in the amount of data to achieve the same performance relative to MAP. Moreover, our analysis reveals simple relations between optimal high dimensional inference and low dimensional scalar Bayesian inference, insights into the nature of generalization and predictive power in high dimensions, information theoretic limits on compressed sensing, phase transitions in quadratic inference, and connections to central mathematical objects in convex optimization theory and random matrix theory.Comment: See http://ganguli-gang.stanford.edu/pdf/HighDimInf.Supp.pdf for supplementary materia

    Mostly Harmless Simulations? Using Monte Carlo Studies for Estimator Selection

    Get PDF
    We consider two recent suggestions for how to perform an empirically motivated Monte Carlo study to help select a treatment effect estimator under unconfoundedness. We show theoretically that neither is likely to be informative except under restrictive conditions that are unlikely to be satisfied in many contexts. To test empirical relevance, we also apply the approaches to a real-world setting where estimator performance is known. Both approaches are worse than random at selecting estimators which minimise absolute bias. They are better when selecting estimators that minimise mean squared error. However, using a simple bootstrap is at least as good and often better. For now researchers would be best advised to use a range of estimators and compare estimates for robustness

    Distance from a fishing community explains fish abundance in a no-take zone with weak compliance

    Get PDF
    There are numerous examples of no-take marine reserves effectively conserving fish stocks within their boundaries. However, no-take reserves can be rendered ineffective and turned into ‘paper parks’ through poor compliance and weak enforcement of reserve regulations. Long-term monitoring is thus essential to assess the effectiveness of marine reserves in meeting conservation and management objectives. This study documents the present state of the 15-year old no-take zone (NTZ) of South El Ghargana within the Nabq Managed Resource Protected Area, South Sinai, Egyptian Red Sea. Previous studies credited willing compliance by the local fishing community for the increased abundances of targeted fish within the designated NTZ boundaries compared to adjacent fished or take-zones. We compared benthic habitat and fish abundance within the NTZ and the adjacent take sites open to fishing, but found no significant effect of the reserve. Instead, the strongest evidence was for a simple negative relationship between fishing pressure and distance from the closest fishing village. The abundance of targeted piscivorous fish increased significantly with increasing distance from the village, while herbivorous fish showed the opposite trend. This gradient was supported by a corresponding negative correlation between the amount of discarded fishing gear observed on the reef and increasing distance from the village. Discarded fishing gear within the NTZ suggested decreased compliance with the no-take regulations. Our findings indicate that due to non-compliance the no-take reserve is no longer functioning effectively, despite its apparent initial successes and instead a gradient of fishing pressure exists with distance from the nearest fishing community

    The dynamic effects of tax audits

    Get PDF
    Understanding causes of and solutions to non-compliance is important for a tax authority. In this paper we study how and why audits affect reported tax in the years after audit – the dynamic effect – for individual income taxpayers. We exploit data from a random audit program covering more than 53,000 income tax self assessment returns in the UK, combined with data on the population of tax filers between 1999 and 2012. We first document that there is substantial non-compliance in this population. One in three filers underreports the tax owed. Third party information on an income source does not predict whether a taxpayer is non-compliant on that income source, though it does predict the extent of underreporting. Using the random nature of the audits, we provide evidence of dynamic effects. Audits raise reported tax liabilities for at least five years after audit, implying an additional yield 1.5 times the direct revenue raised from the audit. The magnitude of the impact falls over time, and this decline is faster for less autocorrelated income sources. Taking an event study approach, we further show that the change in reporting behaviour comes only from those found to have made errors in their tax report. Finally, using an extension of the Allingham-Sandmo (1972) model, we show that these results are best explained by audits providing the tax authority with information, which then constrains taxpayers’ ability to misreport

    Mostly harmless simulations? Using Monte Carlo studies for estimator selection

    Get PDF
    We consider two recent suggestions for how to perform an empirically motivated Monte Carlo study to help select a treatment effect estimator under unconfoundedness. We show theoretically that neither is likely to be informative except under restrictive conditions that are unlikely to be satisfied in many contexts. To test empirical relevance, we also apply the approaches to a real-world setting where estimator performance is known. Both approaches are worse than random at selecting estimators which minimise absolute bias. They are better when selecting estimators that minimise mean squared error. However, using a simple bootstrap is at least as good and often better. For now researchers would be best advised to use a range of estimators and compare estimates for robustness
    corecore